# System Architecture ## High-Level Overview SEO-INTEL is a performance measurement system with three main layers: ``` ┌─────────────────────────────────────────────────────────┐ │ USER LAYER │ │ Web dashboard (HTMX-driven) on port 8765 │ │ - Portfolio scorecard │ │ - Per-site detail (CWV, trend, opportunities) │ │ - On-demand test buttons │ └────────────────────┬────────────────────────────────────┘ │ ┌────────────────────▼────────────────────────────────────┐ │ API LAYER (FastAPI) │ │ - GET /performance/ (portfolio view) │ │ - GET /performance/ (per-site view) │ │ - POST /performance/api/perf/test (trigger test) │ │ - POST /performance/api/perf/sweep (portfolio sweep) │ └────────────────────┬────────────────────────────────────┘ │ ┌────────────────────▼────────────────────────────────────┐ │ TESTING LAYER (Dual Engines) │ │ ┌──────────────────────────────────────────────────┐ │ │ │ Sitespeed.io (Docker) │ │ │ │ - Real browser via headless Chrome │ │ │ │ - 3 runs per test, median metrics │ │ │ │ - HAR export (resource waterfall) │ │ │ │ - CWV: LCP, FCP, CLS, TBT, TTFB │ │ │ │ - Duration: ~60s per device │ │ │ └──────────────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ Google PageSpeed Insights (API) │ │ │ │ - Official Lighthouse audit │ │ │ │ - Opportunities (what to fix) │ │ │ │ - Official performance score (0-100) │ │ │ │ - Duration: ~30s per device │ │ │ └──────────────────────────────────────────────────┘ │ └────────────────────┬────────────────────────────────────┘ │ ┌────────────────────▼────────────────────────────────────┐ │ PERSISTENCE LAYER (SQLite) │ │ - perf_runs (test execution records) │ │ - perf_audits (Core Web Vitals metrics) │ │ - perf_opportunities (Lighthouse opportunities) │ │ - perf_resources (HAR resource list) │ └─────────────────────────────────────────────────────────┘ ``` ## Data Flow: User Clicks "Test Now" ``` 1. User clicks "Test Now" button ↓ 2. HTMX POST to /performance/api/perf/test Body: { site_id: 3, url: "https://...", engines: ["sitespeed", "psi"], devices: ["mobile", "desktop"] } ↓ 3. FastAPI endpoint (performance.py:api_perf_test) ├─ Validate inputs ├─ Spawn background task (ThreadPool) ├─ Return 202 (Accepted) immediately ↓ 4. Background task runs src/perf/runner.py:run_full_test() ├─ For each engine in engines: │ └─ For each device in devices: │ ├─ If sitespeed: │ │ └─ Call src/perf/sitespeed.py:run_sitespeed_test() │ │ ├─ Docker: sitespeedio/sitespeed.io:40.4.0 │ │ ├─ 3 runs (N=3), median metrics │ │ ├─ Parse HAR: /tmp/sitespeed-output/{run_id}/.../browsertime.har │ │ ├─ Extract: lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms, page weight │ │ └─ Approximate score from CWV thresholds │ │ │ └─ If psi: │ └─ Call src/perf/psi.py:run_psi_test() │ ├─ HTTP GET to googleapis.com/pagespeedonline/v5/runPagespeed │ ├─ Parse Lighthouse audits from response │ ├─ Extract: opportunities (what to fix + savings) │ └─ Return official performance_score │ ├─ For each result: _persist_run() writes to database │ ├─ perf_runs (engine, device, success, error_message) │ ├─ perf_audits (performance_score, all CWV metrics) │ ├─ perf_opportunities (opportunity_key, savings_ms, savings_bytes) │ └─ perf_resources (url, type, size, load time) │ └─ Log completion summary ↓ 5. User refreshes dashboard after ~90s ↓ 6. FastAPI queries database ├─ _portfolio_rows() — SELECT latest score per site ├─ _site_url_rows() — SELECT latest score per URL ├─ _site_latest_audit() — SELECT full metrics for latest run ├─ _site_trend() — SELECT weekly AVG scores (12 weeks) ├─ _site_opportunities() — SELECT top PSI opportunities └─ _site_slow_resources() — SELECT top 10 slowest resources ↓ 7. Jinja2 templates render HTML with results ├─ performance.html (portfolio scorecard) └─ performance_site.html (per-site detail with CWV, trend, opps) ↓ 8. User sees updated scores, metrics, trend chart, opportunities ``` ## Component Breakdown ### 1. Sitespeed.io Testing (src/perf/sitespeed.py) **Purpose:** Capture real browser performance metrics via headless Chrome **Process:** ```python run_sitespeed_test(url="https://rds.ink/endangered", device="mobile") ├─ Generate unique run_id (UUID) ├─ Create output dir: /tmp/sitespeed-output/{run_id}/ ├─ Build Docker command: │ docker run --rm \ │ -v /tmp/sitespeed-output:/sitespeed.io \ │ sitespeedio/sitespeed.io:40.4.0 \ │ {url} \ │ --mobile --connectivity 4g \ (if device=="mobile") │ --n 3 \ (3 runs, median taken) │ --outputFolder /sitespeed.io/{run_id} \ │ --summary --summary-detail │ ├─ Wait for Docker container to complete (~60s) ├─ Parse HAR: /sitespeed.io/{run_id}/.../browsertime.har │ ├─ Extract pages[]._ googleWebVitals (LCP, FCP, CLS, TTFB) │ ├─ Extract pages[]._cpu.longTasks.totalBlockingTime (TBT) │ ├─ Compute medians across N=3 runs │ ├─ Extract resource list (URL, type, size, timing) │ └─ Categorise resources (script, stylesheet, image, font, xhr, other) │ ├─ Calculate page weight breakdown: │ ├─ total_bytes = sum of all response bodySize │ ├─ image_bytes = sum where Content-Type contains "image" │ ├─ js_bytes = sum where Content-Type contains "javascript" │ ├─ css_bytes = sum where Content-Type contains "css" │ └─ font_bytes = sum where Content-Type contains "font" │ ├─ Approximate performance score: │ └─ _approx_score(lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms) │ (See Section 2: Score Calculation) │ └─ Return dict: { "success": true, "device": "mobile", "performance_score": 77, ← Approximated, not Lighthouse "metrics": { "lcp_ms": null, "fcp_ms": 2116, "cls": 0.0, "tbt_ms": 1807, "ttfb_ms": 144, ... }, "resources": [ {"resource_url": "...", "size_bytes": 12345, ...}, ... ] } ``` **Key note:** Sitespeed v40 does NOT run Lighthouse. Performance score is approximated from CWV thresholds. For official Lighthouse, use PSI. ### 2. PageSpeed Insights Testing (src/perf/psi.py) **Purpose:** Get official Google Lighthouse audit + opportunities **Process:** ```python run_psi_test(url="https://rds.ink/endangered", device="mobile") ├─ Build API request: │ GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed │ ?url={url}&strategy=mobile&category=performance&key={api_key} │ ├─ Wait for Google to run Lighthouse (~30s) ├─ Parse response.lighthouseResult: │ ├─ Extract categories.performance.score (0-1) → multiply by 100 │ ├─ Extract audits[]: │ │ ├─ "largest-contentful-paint" → lcp_ms │ │ ├─ "first-contentful-paint" → fcp_ms │ │ ├─ "cumulative-layout-shift" → cls │ │ ├─ "total-blocking-time" → tbt_ms │ │ ├─ "interaction-to-next-paint" → inp_ms │ │ └─ "server-response-time" → ttfb_ms │ │ │ └─ For each audit with details.type == "opportunity": │ ├─ Extract display title │ ├─ Extract overallSavingsMs (potential speed gain) │ ├─ Extract overallSavingsBytes (potential size reduction) │ └─ Store for recommendations │ └─ Return dict: { "success": true, "device": "mobile", "performance_score": 95, ← Official Lighthouse "metrics": { ... }, ← Same structure as sitespeed "opportunities": [ { "opportunity_key": "unused-javascript", "display_label": "Reduce unused JavaScript", "savings_ms": 400, ← potential gain "savings_bytes": 150000 }, ... ] } ``` ### 3. Test Orchestration (src/perf/runner.py) **Purpose:** Run all engines × devices combinations and persist results ```python run_full_test( site_id=3, url="https://rds.ink/endangered", engines=["sitespeed", "psi"], devices=["mobile", "desktop"] ) ├─ For engine in ["sitespeed", "psi"]: │ └─ For device in ["mobile", "desktop"]: │ ├─ Run the appropriate test (sitespeed or psi) │ ├─ Call _persist_run() to write results: │ │ ├─ INSERT perf_runs (site_id, url, engine, device, ...) │ │ ├─ INSERT perf_audits (performance_score, all metrics) │ │ ├─ INSERT perf_opportunities (for each opportunity) │ │ └─ INSERT perf_resources (for each resource) │ │ │ └─ Log result (success or error) │ └─ Return summary: { "url": "https://...", "results": { "sitespeed_mobile": { "run_id": 1, "score": 77, "success": true }, "sitespeed_desktop": { "run_id": 2, "score": 82, "success": true }, "psi_mobile": { "run_id": 3, "score": 95, "success": true }, "psi_desktop": { "run_id": 4, "score": 93, "success": true } } } ``` ### 4. Portfolio Sweep (src/perf/batch.py) **Purpose:** Weekly automated test of all sites × top URLs **Scheduled:** Monday 04:00 AEST (hard-coded in template) ```python run_weekly_perf_sweep(db) ├─ For each site in SITES (13 sites): │ ├─ resolve_url_list(domain): │ │ ├─ Get homepage: https://{domain}/ │ │ ├─ Query ranking_snapshots last 30 days │ │ ├─ Get top 5 URLs by impressions │ │ └─ Return: [homepage, url1, url2, url3, url4, url5] (6 URLs max) │ │ │ ├─ For each URL: │ │ ├─ Call run_full_test(site_id, url, engines=["sitespeed", "psi"], devices=["mobile", "desktop"]) │ │ └─ Wait 5 seconds (inter-URL delay to avoid rate limits) │ │ │ └─ Log site completion (e.g., "dayboro.au: 6 URLs × 4 runs = 24 tests complete") │ └─ Total: ~13 sites × 6 URLs × 4 runs = ~312 tests, ~5 hours ``` ### 5. Database Persistence (src/models/perf.py) Four tables, one per concept: | Table | Purpose | Key Fields | |-------|---------|-----------| | `perf_runs` | Test execution records | site_id, url, engine, device, completed_at, success | | `perf_audits` | Core Web Vitals metrics | perf_run_id, performance_score, lcp_ms, cls, tbt_ms, etc. | | `perf_opportunities` | Lighthouse audit opportunities | perf_run_id, opportunity_key, savings_ms, savings_bytes | | `perf_resources` | HAR resource list | perf_run_id, resource_url, type, size_bytes, duration_ms | Each perf_run can have: - 1 perf_audit (metrics) - 0+ perf_opportunities (if PSI ran) - 0+ perf_resources (if HAR captured) ### 6. Web Interface (templates/performance.html, performance_site.html) **Portfolio view** (performance.html): - Table of all sites - Latest mobile + desktop scores per site - Slowest URL per site - Last tested timestamp - "Run portfolio sweep now" button (HTMX trigger) **Per-site view** (performance_site.html): - CWV metrics for latest run (mobile + desktop side-by-side) - 12-week trend sparkline chart (two bars per week) - Top 5 opportunities from PSI - Top 10 slowest resources from sitespeed - Per-URL breakdown table with test buttons ## Why Two Engines? | Aspect | Sitespeed | PSI | |--------|-----------|-----| | **What it measures** | Real browser (Browsertime) + HAR waterfall | Official Lighthouse audit | | **Speed** | ~60s per device | ~30s per device | | **Score source** | Approximated from CWV thresholds | Official Google Lighthouse | | **Opportunities** | None (no Lighthouse) | Yes (full audit) | | **Resource list** | Yes (full HAR) | No (limited) | | **Use case** | Trend tracking, resource diagnosis | Official benchmarking, opportunities | **Strategy:** Run both in parallel. Sitespeed gives you the waterfall + trend, PSI gives you official score + what to fix. --- See also: - [Score Calculation](02-score-calculation.md) — How the 0-100 score is derived - [Testing Engines](04-testing-engines.md) — Deep dive into each engine - [Database Schema](../code-refs/database-schema.md) — All fields, all relationships