Files
seo-intel-docs/docs/01-architecture.md
help4bis 335d9a76e1 Initial SEO-INTEL documentation: architecture, scoring, code structure
Add comprehensive documentation for the dual-engine performance evaluation system:
- System architecture and data flow
- Score calculation methodology (0-100 approximation from CWV thresholds)
- Detailed metrics reference (LCP, FCP, CLS, TBT, TTFB)
- Testing engines comparison (Sitespeed vs PSI)
- Complete code structure map (file-by-file breakdown)
- Case study: rds.ink 77 score with actionable fixes
- Quick reference guides for interpreting results

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-14 05:56:49 +10:00

14 KiB
Raw Blame History

System Architecture

High-Level Overview

SEO-INTEL is a performance measurement system with three main layers:

┌─────────────────────────────────────────────────────────┐
│ USER LAYER                                               │
│ Web dashboard (HTMX-driven) on port 8765                │
│ - Portfolio scorecard                                    │
│ - Per-site detail (CWV, trend, opportunities)           │
│ - On-demand test buttons                                │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│ API LAYER (FastAPI)                                     │
│ - GET /performance/              (portfolio view)       │
│ - GET /performance/<site_id>     (per-site view)        │
│ - POST /performance/api/perf/test    (trigger test)     │
│ - POST /performance/api/perf/sweep   (portfolio sweep)  │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│ TESTING LAYER (Dual Engines)                            │
│ ┌──────────────────────────────────────────────────┐   │
│ │ Sitespeed.io (Docker)                            │   │
│ │ - Real browser via headless Chrome               │   │
│ │ - 3 runs per test, median metrics                │   │
│ │ - HAR export (resource waterfall)                │   │
│ │ - CWV: LCP, FCP, CLS, TBT, TTFB                │   │
│ │ - Duration: ~60s per device                      │   │
│ └──────────────────────────────────────────────────┘   │
│ ┌──────────────────────────────────────────────────┐   │
│ │ Google PageSpeed Insights (API)                  │   │
│ │ - Official Lighthouse audit                      │   │
│ │ - Opportunities (what to fix)                    │   │
│ │ - Official performance score (0-100)             │   │
│ │ - Duration: ~30s per device                      │   │
│ └──────────────────────────────────────────────────┘   │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│ PERSISTENCE LAYER (SQLite)                              │
│ - perf_runs (test execution records)                    │
│ - perf_audits (Core Web Vitals metrics)                 │
│ - perf_opportunities (Lighthouse opportunities)         │
│ - perf_resources (HAR resource list)                    │
└─────────────────────────────────────────────────────────┘

Data Flow: User Clicks "Test Now"

1. User clicks "Test Now" button
   ↓
2. HTMX POST to /performance/api/perf/test
   Body: { site_id: 3, url: "https://...", 
           engines: ["sitespeed", "psi"], 
           devices: ["mobile", "desktop"] }
   ↓
3. FastAPI endpoint (performance.py:api_perf_test)
   ├─ Validate inputs
   ├─ Spawn background task (ThreadPool)
   ├─ Return 202 (Accepted) immediately
   ↓
4. Background task runs src/perf/runner.py:run_full_test()
   ├─ For each engine in engines:
   │  └─ For each device in devices:
   │     ├─ If sitespeed:
   │     │  └─ Call src/perf/sitespeed.py:run_sitespeed_test()
   │     │     ├─ Docker: sitespeedio/sitespeed.io:40.4.0
   │     │     ├─ 3 runs (N=3), median metrics
   │     │     ├─ Parse HAR: /tmp/sitespeed-output/{run_id}/.../browsertime.har
   │     │     ├─ Extract: lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms, page weight
   │     │     └─ Approximate score from CWV thresholds
   │     │
   │     └─ If psi:
   │        └─ Call src/perf/psi.py:run_psi_test()
   │           ├─ HTTP GET to googleapis.com/pagespeedonline/v5/runPagespeed
   │           ├─ Parse Lighthouse audits from response
   │           ├─ Extract: opportunities (what to fix + savings)
   │           └─ Return official performance_score
   │
   ├─ For each result: _persist_run() writes to database
   │  ├─ perf_runs (engine, device, success, error_message)
   │  ├─ perf_audits (performance_score, all CWV metrics)
   │  ├─ perf_opportunities (opportunity_key, savings_ms, savings_bytes)
   │  └─ perf_resources (url, type, size, load time)
   │
   └─ Log completion summary
   
   ↓
5. User refreshes dashboard after ~90s
   ↓
6. FastAPI queries database
   ├─ _portfolio_rows() — SELECT latest score per site
   ├─ _site_url_rows() — SELECT latest score per URL
   ├─ _site_latest_audit() — SELECT full metrics for latest run
   ├─ _site_trend() — SELECT weekly AVG scores (12 weeks)
   ├─ _site_opportunities() — SELECT top PSI opportunities
   └─ _site_slow_resources() — SELECT top 10 slowest resources
   
   ↓
7. Jinja2 templates render HTML with results
   ├─ performance.html (portfolio scorecard)
   └─ performance_site.html (per-site detail with CWV, trend, opps)
   
   ↓
8. User sees updated scores, metrics, trend chart, opportunities

Component Breakdown

1. Sitespeed.io Testing (src/perf/sitespeed.py)

Purpose: Capture real browser performance metrics via headless Chrome

Process:

run_sitespeed_test(url="https://rds.ink/endangered", device="mobile")
├─ Generate unique run_id (UUID)
├─ Create output dir: /tmp/sitespeed-output/{run_id}/
├─ Build Docker command:
  docker run --rm \
  -v /tmp/sitespeed-output:/sitespeed.io \
  sitespeedio/sitespeed.io:40.4.0 \
  {url} \
  --mobile --connectivity 4g \      (if device=="mobile")
  --n 3 \                           (3 runs, median taken)
  --outputFolder /sitespeed.io/{run_id} \
  --summary --summary-detail

├─ Wait for Docker container to complete (~60s)
├─ Parse HAR: /sitespeed.io/{run_id}/.../browsertime.har
  ├─ Extract pages[]._ googleWebVitals (LCP, FCP, CLS, TTFB)
  ├─ Extract pages[]._cpu.longTasks.totalBlockingTime (TBT)
  ├─ Compute medians across N=3 runs
  ├─ Extract resource list (URL, type, size, timing)
  └─ Categorise resources (script, stylesheet, image, font, xhr, other)

├─ Calculate page weight breakdown:
  ├─ total_bytes = sum of all response bodySize
  ├─ image_bytes = sum where Content-Type contains "image"
  ├─ js_bytes = sum where Content-Type contains "javascript"
  ├─ css_bytes = sum where Content-Type contains "css"
  └─ font_bytes = sum where Content-Type contains "font"

├─ Approximate performance score:
  └─ _approx_score(lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms)
     (See Section 2: Score Calculation)

└─ Return dict:
   {
     "success": true,
     "device": "mobile",
     "performance_score": 77,     Approximated, not Lighthouse
     "metrics": {
       "lcp_ms": null,
       "fcp_ms": 2116,
       "cls": 0.0,
       "tbt_ms": 1807,
       "ttfb_ms": 144,
       ...
     },
     "resources": [
       {"resource_url": "...", "size_bytes": 12345, ...},
       ...
     ]
   }

Key note: Sitespeed v40 does NOT run Lighthouse. Performance score is approximated from CWV thresholds. For official Lighthouse, use PSI.

2. PageSpeed Insights Testing (src/perf/psi.py)

Purpose: Get official Google Lighthouse audit + opportunities

Process:

run_psi_test(url="https://rds.ink/endangered", device="mobile")
├─ Build API request:
  GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed
  ?url={url}&strategy=mobile&category=performance&key={api_key}

├─ Wait for Google to run Lighthouse (~30s)
├─ Parse response.lighthouseResult:
  ├─ Extract categories.performance.score (0-1)  multiply by 100
  ├─ Extract audits[]:
    ├─ "largest-contentful-paint"  lcp_ms
    ├─ "first-contentful-paint"  fcp_ms
    ├─ "cumulative-layout-shift"  cls
    ├─ "total-blocking-time"  tbt_ms
    ├─ "interaction-to-next-paint"  inp_ms
    └─ "server-response-time"  ttfb_ms
  
  └─ For each audit with details.type == "opportunity":
     ├─ Extract display title
     ├─ Extract overallSavingsMs (potential speed gain)
     ├─ Extract overallSavingsBytes (potential size reduction)
     └─ Store for recommendations

└─ Return dict:
   {
     "success": true,
     "device": "mobile",
     "performance_score": 95,     Official Lighthouse
     "metrics": { ... },          Same structure as sitespeed
     "opportunities": [
       {
         "opportunity_key": "unused-javascript",
         "display_label": "Reduce unused JavaScript",
         "savings_ms": 400,       potential gain
         "savings_bytes": 150000
       },
       ...
     ]
   }

3. Test Orchestration (src/perf/runner.py)

Purpose: Run all engines × devices combinations and persist results

run_full_test(
  site_id=3,
  url="https://rds.ink/endangered",
  engines=["sitespeed", "psi"],
  devices=["mobile", "desktop"]
)
├─ For engine in ["sitespeed", "psi"]:
  └─ For device in ["mobile", "desktop"]:
     ├─ Run the appropriate test (sitespeed or psi)
     ├─ Call _persist_run() to write results:
       ├─ INSERT perf_runs (site_id, url, engine, device, ...)
       ├─ INSERT perf_audits (performance_score, all metrics)
       ├─ INSERT perf_opportunities (for each opportunity)
       └─ INSERT perf_resources (for each resource)
     
     └─ Log result (success or error)

└─ Return summary:
   {
     "url": "https://...",
     "results": {
       "sitespeed_mobile":  { "run_id": 1, "score": 77, "success": true },
       "sitespeed_desktop": { "run_id": 2, "score": 82, "success": true },
       "psi_mobile":        { "run_id": 3, "score": 95, "success": true },
       "psi_desktop":       { "run_id": 4, "score": 93, "success": true }
     }
   }

4. Portfolio Sweep (src/perf/batch.py)

Purpose: Weekly automated test of all sites × top URLs

Scheduled: Monday 04:00 AEST (hard-coded in template)

run_weekly_perf_sweep(db)
├─ For each site in SITES (13 sites):
  ├─ resolve_url_list(domain):
    ├─ Get homepage: https://{domain}/
    ├─ Query ranking_snapshots last 30 days
    ├─ Get top 5 URLs by impressions
    └─ Return: [homepage, url1, url2, url3, url4, url5]  (6 URLs max)
  
  ├─ For each URL:
    ├─ Call run_full_test(site_id, url, engines=["sitespeed", "psi"], devices=["mobile", "desktop"])
    └─ Wait 5 seconds (inter-URL delay to avoid rate limits)
  
  └─ Log site completion (e.g., "dayboro.au: 6 URLs × 4 runs = 24 tests complete")

└─ Total: ~13 sites × 6 URLs × 4 runs = ~312 tests, ~5 hours

5. Database Persistence (src/models/perf.py)

Four tables, one per concept:

Table Purpose Key Fields
perf_runs Test execution records site_id, url, engine, device, completed_at, success
perf_audits Core Web Vitals metrics perf_run_id, performance_score, lcp_ms, cls, tbt_ms, etc.
perf_opportunities Lighthouse audit opportunities perf_run_id, opportunity_key, savings_ms, savings_bytes
perf_resources HAR resource list perf_run_id, resource_url, type, size_bytes, duration_ms

Each perf_run can have:

  • 1 perf_audit (metrics)
  • 0+ perf_opportunities (if PSI ran)
  • 0+ perf_resources (if HAR captured)

6. Web Interface (templates/performance.html, performance_site.html)

Portfolio view (performance.html):

  • Table of all sites
  • Latest mobile + desktop scores per site
  • Slowest URL per site
  • Last tested timestamp
  • "Run portfolio sweep now" button (HTMX trigger)

Per-site view (performance_site.html):

  • CWV metrics for latest run (mobile + desktop side-by-side)
  • 12-week trend sparkline chart (two bars per week)
  • Top 5 opportunities from PSI
  • Top 10 slowest resources from sitespeed
  • Per-URL breakdown table with test buttons

Why Two Engines?

Aspect Sitespeed PSI
What it measures Real browser (Browsertime) + HAR waterfall Official Lighthouse audit
Speed ~60s per device ~30s per device
Score source Approximated from CWV thresholds Official Google Lighthouse
Opportunities None (no Lighthouse) Yes (full audit)
Resource list Yes (full HAR) No (limited)
Use case Trend tracking, resource diagnosis Official benchmarking, opportunities

Strategy: Run both in parallel. Sitespeed gives you the waterfall + trend, PSI gives you official score + what to fix.


See also: