Initial SEO-INTEL documentation: architecture, scoring, code structure
Add comprehensive documentation for the dual-engine performance evaluation system: - System architecture and data flow - Score calculation methodology (0-100 approximation from CWV thresholds) - Detailed metrics reference (LCP, FCP, CLS, TBT, TTFB) - Testing engines comparison (Sitespeed vs PSI) - Complete code structure map (file-by-file breakdown) - Case study: rds.ink 77 score with actionable fixes - Quick reference guides for interpreting results Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
373
code-refs/file-structure.md
Normal file
373
code-refs/file-structure.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# Code Structure Map
|
||||
|
||||
Complete file-by-file breakdown of the seo-intel repository.
|
||||
|
||||
## Directory Layout
|
||||
|
||||
```
|
||||
/home/help4bis/seo-intel/
|
||||
├── README.md # Project overview (v1.1.0)
|
||||
├── pyproject.toml # Python project config (dependencies, build)
|
||||
├── requirements.txt # Python package list
|
||||
├── run.sh # Launch script (runs main.py)
|
||||
├── .env # Secrets: PSI_API_KEY, DB path, etc.
|
||||
│
|
||||
├── src/ # Python package
|
||||
│ ├── __init__.py
|
||||
│ ├── main.py # FastAPI app entry point
|
||||
│ ├── config.py # Settings, site list (SITES config)
|
||||
│ ├── db.py # SQLAlchemy setup, migrations, session factory
|
||||
│ │
|
||||
│ ├── models/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── perf.py # ORM models: PerfRun, PerfAudit, PerfOpportunity, PerfResource
|
||||
│ │ ├── site.py # Site model (name, domain, priority)
|
||||
│ │ ├── ranking.py # Ranking snapshot model (SEO keyword rankings)
|
||||
│ │ └── ... # Other models (not perf-related)
|
||||
│ │
|
||||
│ ├── routers/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── performance.py # GET /performance/, /performance/<site_id>, POST /api/perf/test, /api/perf/sweep
|
||||
│ │ ├── dashboard.py # GET / (main dashboard)
|
||||
│ │ ├── keywords.py # Keyword ranking pages
|
||||
│ │ └── ... # Other routers (not perf-related)
|
||||
│ │
|
||||
│ ├── perf/ # Performance testing engines
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── runner.py # Orchestrator: run_full_test() — runs engines × devices
|
||||
│ │ ├── sitespeed.py # Sitespeed.io Docker wrapper + HAR parser
|
||||
│ │ ├── psi.py # Google PageSpeed Insights API client
|
||||
│ │ └── batch.py # Weekly sweep logic
|
||||
│ │
|
||||
│ ├── playbook/ # SEO playbook generation (not perf-related)
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── rules.py
|
||||
│ │ └── llm.py
|
||||
│ │
|
||||
│ └── ... # Other modules (keyword analysis, etc.)
|
||||
│
|
||||
├── templates/ # Jinja2 HTML templates
|
||||
│ ├── base.html # Base template (nav, styling)
|
||||
│ ├── performance.html # Portfolio scorecard
|
||||
│ ├── performance_site.html # Per-site detail dashboard
|
||||
│ ├── dashboard.html # Main dashboard
|
||||
│ └── ... # Other templates
|
||||
│
|
||||
├── data/
|
||||
│ └── seo-intel.db # SQLite database (perf_runs, perf_audits, etc.)
|
||||
│
|
||||
├── docs/ # Documentation (this repo)
|
||||
│
|
||||
└── ops/ # Operations scripts
|
||||
├── schema.sql # Database schema
|
||||
└── ...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance System Files (Perf Tier)
|
||||
|
||||
### src/routers/performance.py
|
||||
|
||||
**Purpose:** FastAPI routes for the performance dashboard
|
||||
|
||||
**Key functions:**
|
||||
- `performance_home(request, db)` — `GET /performance/` → portfolio scorecard
|
||||
- `performance_site(site_id, request, db)` — `GET /performance/<site_id>` → per-site detail
|
||||
- `api_perf_test(body, background_tasks, db)` — `POST /api/perf/test` → trigger single URL test
|
||||
- `api_perf_sweep(background_tasks)` — `POST /api/perf/sweep` → trigger portfolio sweep
|
||||
- `_portfolio_rows(db)` — SQL: latest scores per site × device
|
||||
- `_site_url_rows(db, site_id)` — SQL: latest score per URL
|
||||
- `_site_latest_audit(db, site_id, device)` — SQL: full metrics for latest run
|
||||
- `_site_trend(db, site_id, weeks)` — SQL: weekly AVG scores (12 weeks)
|
||||
- `_site_opportunities(db, site_id, device)` — SQL: top PSI opportunities
|
||||
- `_site_slow_resources(db, site_id)` — SQL: top 10 slowest resources
|
||||
|
||||
**Key imports:**
|
||||
```python
|
||||
from fastapi import APIRouter, BackgroundTasks, Depends
|
||||
from sqlalchemy import text
|
||||
from fastapi.templating import Jinja2Templates
|
||||
from .perf.runner import run_full_test
|
||||
from .perf.batch import run_weekly_perf_sweep
|
||||
```
|
||||
|
||||
**Size:** ~545 lines
|
||||
|
||||
---
|
||||
|
||||
### src/perf/runner.py
|
||||
|
||||
**Purpose:** Orchestrates test runs across engines and devices
|
||||
|
||||
**Key functions:**
|
||||
- `run_full_test(site_id, url, db, engines, devices)` — Main orchestrator
|
||||
- Loops: for engine in engines: for device in devices:
|
||||
- Calls appropriate engine (sitespeed or psi)
|
||||
- Persists each result via `_persist_run()`
|
||||
- Returns summary dict
|
||||
- `_persist_run(db, site_id, url, engine, result)` — Writes one test result to database
|
||||
- Inserts: perf_runs (1), perf_audits (1), perf_opportunities (0+), perf_resources (0+)
|
||||
- Commits transaction
|
||||
|
||||
**Key imports:**
|
||||
```python
|
||||
from sqlalchemy.orm import Session
|
||||
from .models.perf import PerfRun, PerfAudit, PerfOpportunity, PerfResource
|
||||
from .sitespeed import run_sitespeed_test
|
||||
from .psi import run_psi_test
|
||||
```
|
||||
|
||||
**Size:** ~200 lines
|
||||
|
||||
---
|
||||
|
||||
### src/perf/sitespeed.py
|
||||
|
||||
**Purpose:** Wraps sitespeed.io Docker container, parses HAR output
|
||||
|
||||
**Key functions:**
|
||||
- `run_sitespeed_test(url, device)` — Execute sitespeed in Docker
|
||||
- Builds Docker command with device-specific args (--mobile vs desktop UA)
|
||||
- Runs `docker run sitespeedio/sitespeed.io:40.4.0 {url} --n 3 ...`
|
||||
- Waits for output (60s)
|
||||
- Calls `_parse_har()` to extract metrics
|
||||
- Calls `_approx_score()` to calculate performance score
|
||||
- Returns: success, performance_score, metrics, resources
|
||||
- `_parse_har(har_path)` — Parse `/tmp/sitespeed-output/{run_id}/.../browsertime.har`
|
||||
- Extracts _googleWebVitals from pages[] (LCP, FCP, CLS, TTFB)
|
||||
- Extracts _cpu.longTasks.totalBlockingTime from pages[] (TBT)
|
||||
- Sums resource sizes by type (image, script, stylesheet, font)
|
||||
- Returns: metrics dict, resources list
|
||||
- `_approx_score(lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms)` — Calculate 0-100 score
|
||||
- Uses _THRESHOLDS (lines 53–60)
|
||||
- Linear interpolation between good/poor for each metric
|
||||
- Returns: int(mean(all_metric_scores))
|
||||
- `_guess_resource_type(url, content_type)` — Classify resource (script, image, etc.)
|
||||
|
||||
**Key constants:**
|
||||
- `SITESPEED_IMAGE = "sitespeedio/sitespeed.io:40.4.0"` (pinned version)
|
||||
- `OUTPUT_BASE = Path("/tmp/sitespeed-output")` (Docker output mount point)
|
||||
- `_THRESHOLDS` dict (lines 53–60): (good, poor) for LCP, FCP, CLS, TBT, TTFB
|
||||
|
||||
**Size:** ~450 lines
|
||||
|
||||
---
|
||||
|
||||
### src/perf/psi.py
|
||||
|
||||
**Purpose:** Calls Google PageSpeed Insights API, parses Lighthouse results
|
||||
|
||||
**Key functions:**
|
||||
- `run_psi_test(url, device)` — Call PageSpeed Insights API
|
||||
- GET `https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=...&strategy={device}`
|
||||
- Parses response.lighthouseResult
|
||||
- Calls `_parse_lighthouse_audits()` (shared with sitespeed)
|
||||
- Returns: success, performance_score (official), metrics, opportunities
|
||||
- `_parse_lighthouse_audits(audits)` — Extract metrics + opportunities from Lighthouse JSON
|
||||
- Maps audit keys (largest-contentful-paint, etc.) to metric values
|
||||
- Extracts opportunities (audit.details.type == "opportunity")
|
||||
- Calculates savings_ms and savings_bytes for each opportunity
|
||||
- Returns: metrics dict, opportunities list
|
||||
|
||||
**Key constants:**
|
||||
- `PSI_ENDPOINT = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"`
|
||||
- `PSI_TIMEOUT = 90` (Google's API can be slow)
|
||||
|
||||
**Size:** ~150 lines
|
||||
|
||||
---
|
||||
|
||||
### src/perf/batch.py
|
||||
|
||||
**Purpose:** Weekly portfolio performance sweep
|
||||
|
||||
**Key functions:**
|
||||
- `run_weekly_perf_sweep(db)` — Main sweep orchestrator
|
||||
- Loops: for each site in SITES:
|
||||
- Calls `resolve_url_list()` to get top 6 URLs
|
||||
- For each URL: calls `run_full_test()` (sitespeed + psi, mobile + desktop)
|
||||
- Logs completion summary
|
||||
- `resolve_url_list(db, domain)` — Get URLs for a site
|
||||
- Always: homepage
|
||||
- Plus: top 5 URLs from ranking_snapshots (last 30 days, sorted by impressions)
|
||||
- Returns: list of 6 URLs max
|
||||
- `_get_top_urls(db, site_id, limit)` — Query ranking_snapshots for impressions
|
||||
|
||||
**Size:** ~150 lines
|
||||
|
||||
---
|
||||
|
||||
### src/models/perf.py
|
||||
|
||||
**Purpose:** SQLAlchemy ORM models for performance data
|
||||
|
||||
**Models:**
|
||||
- `PerfRun` — Test execution record
|
||||
- Fields: id, site_id, url, engine, device, started_at, completed_at, success, error_message
|
||||
- Relations: audits (1-to-many), opportunities (1-to-many), resources (1-to-many)
|
||||
- `PerfAudit` — Core Web Vitals metrics for one run
|
||||
- Fields: id, perf_run_id, performance_score, lcp_ms, cls, inp_ms, tbt_ms, fcp_ms, ttfb_ms, total_byte_weight, image_bytes, js_bytes, css_bytes, font_bytes, requests_count, dom_size
|
||||
- Relations: run (many-to-1)
|
||||
- `PerfOpportunity` — Lighthouse audit opportunity
|
||||
- Fields: id, perf_run_id, opportunity_key, display_label, savings_ms, savings_bytes, details_json
|
||||
- Relations: run (many-to-1)
|
||||
- `PerfResource` — HAR resource entry
|
||||
- Fields: id, perf_run_id, resource_url, resource_type, size_bytes, transfer_size_bytes, start_time_ms, end_time_ms, is_render_blocking
|
||||
- Relations: run (many-to-1)
|
||||
|
||||
**Size:** ~100 lines
|
||||
|
||||
---
|
||||
|
||||
## Templates
|
||||
|
||||
### templates/performance.html
|
||||
|
||||
**Purpose:** Portfolio performance scorecard
|
||||
|
||||
**Features:**
|
||||
- Table of all sites (13 rows)
|
||||
- Columns: domain, score_mobile, score_desktop, lcp_ms, cls, slowest_url, last_tested
|
||||
- Colour-coded scores (green ≥90, amber ≥50, red <50)
|
||||
- "Run portfolio sweep now" button (HTMX POST to /api/perf/sweep)
|
||||
- Sweep status display (idle | running | ok | error)
|
||||
|
||||
**Size:** ~200 lines
|
||||
|
||||
---
|
||||
|
||||
### templates/performance_site.html
|
||||
|
||||
**Purpose:** Per-site performance detail dashboard
|
||||
|
||||
**Features:**
|
||||
- Latest CWV metrics (mobile + desktop side-by-side)
|
||||
- 12-week trend sparkline chart (mobile + desktop bars per week)
|
||||
- Top 5 optimisation opportunities (PSI)
|
||||
- Top 10 slowest resources (sitespeed HAR)
|
||||
- Per-URL breakdown table with test buttons
|
||||
- Columns: URL, score, LCP, CLS, requests, tested_at, test_now_buttons
|
||||
- Test buttons: Both (mobile+desktop), Mob, Dsk
|
||||
|
||||
**Interactive elements:**
|
||||
- HTMX buttons that queue tests
|
||||
- Coloured metric badges (green/amber/red)
|
||||
- Tooltips for long URLs
|
||||
|
||||
**Size:** ~390 lines
|
||||
|
||||
---
|
||||
|
||||
## Supporting Files
|
||||
|
||||
### src/config.py
|
||||
|
||||
**What it contains:**
|
||||
- `Settings` class (Pydantic)
|
||||
- `SITES` — list of 13 sites to monitor
|
||||
- Each site: domain, priority (sorting order)
|
||||
|
||||
**Size:** ~50 lines
|
||||
|
||||
---
|
||||
|
||||
### src/db.py
|
||||
|
||||
**What it contains:**
|
||||
- SQLAlchemy engine + session factory
|
||||
- `Base` (declarative base for all models)
|
||||
- Database URI from .env
|
||||
- Migration logic (auto-create tables on startup)
|
||||
|
||||
**Size:** ~60 lines
|
||||
|
||||
---
|
||||
|
||||
### requirements.txt
|
||||
|
||||
Key dependencies for performance testing:
|
||||
- fastapi, uvicorn (web framework)
|
||||
- sqlalchemy (ORM)
|
||||
- httpx (for PSI API calls)
|
||||
- docker (for sitespeed execution)
|
||||
- jinja2 (templates)
|
||||
|
||||
---
|
||||
|
||||
## File Interaction Map
|
||||
|
||||
```
|
||||
FastAPI Request
|
||||
↓
|
||||
performance.py (routers)
|
||||
↓
|
||||
[Query] perf_audits table via SQL
|
||||
├─→ db.py (SQLAlchemy session)
|
||||
│
|
||||
[Create] templates (Jinja2)
|
||||
├─→ performance_site.html
|
||||
└─→ performance.html
|
||||
|
||||
[Background Task] api_perf_test()
|
||||
↓
|
||||
runner.py:run_full_test()
|
||||
├─ For each engine:
|
||||
│ ├─ sitespeed.py:run_sitespeed_test() → Docker
|
||||
│ │ ├─ subprocess.run("docker run sitespeedio/...")
|
||||
│ │ ├─ _parse_har(browsertime.har)
|
||||
│ │ └─ _approx_score(metrics) → 0-100
|
||||
│ │
|
||||
│ └─ psi.py:run_psi_test() → Google API
|
||||
│ ├─ httpx.get(googleapis.com/...)
|
||||
│ ├─ _parse_lighthouse_audits(audits)
|
||||
│ └─ opportunities + official_score
|
||||
│
|
||||
├─ runner.py:_persist_run() for each result
|
||||
│ ├─ INSERT perf_runs
|
||||
│ ├─ INSERT perf_audits
|
||||
│ ├─ INSERT perf_opportunities
|
||||
│ └─ INSERT perf_resources
|
||||
│
|
||||
└─ models/perf.py (ORM objects)
|
||||
└─ db.py (commit to SQLAlchemy)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
All files live in `/home/help4bis/seo-intel/` on george (192.168.0.117).
|
||||
|
||||
**To start the service:**
|
||||
```bash
|
||||
cd /home/help4bis/seo-intel
|
||||
./run.sh
|
||||
# or
|
||||
uvicorn src.main:app --host 0.0.0.0 --port 8765 --reload
|
||||
```
|
||||
|
||||
**To run tests manually:**
|
||||
```bash
|
||||
cd /home/help4bis/seo-intel
|
||||
python -c "
|
||||
from src.perf.runner import run_full_test
|
||||
from src.db import SessionLocal
|
||||
|
||||
db = SessionLocal()
|
||||
result = run_full_test(
|
||||
site_id=3,
|
||||
url='https://rds.ink/endangered',
|
||||
db=db,
|
||||
engines=['sitespeed', 'psi'],
|
||||
devices=['mobile', 'desktop']
|
||||
)
|
||||
print(result)
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
See also:
|
||||
- [Database Schema](database-schema.md) — All tables and fields
|
||||
- [API Endpoints](api-endpoints.md) — HTTP routes and payloads
|
||||
Reference in New Issue
Block a user