Files
seo-intel-docs/docs/01-architecture.md
help4bis 335d9a76e1 Initial SEO-INTEL documentation: architecture, scoring, code structure
Add comprehensive documentation for the dual-engine performance evaluation system:
- System architecture and data flow
- Score calculation methodology (0-100 approximation from CWV thresholds)
- Detailed metrics reference (LCP, FCP, CLS, TBT, TTFB)
- Testing engines comparison (Sitespeed vs PSI)
- Complete code structure map (file-by-file breakdown)
- Case study: rds.ink 77 score with actionable fixes
- Quick reference guides for interpreting results

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-14 05:56:49 +10:00

330 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# System Architecture
## High-Level Overview
SEO-INTEL is a performance measurement system with three main layers:
```
┌─────────────────────────────────────────────────────────┐
│ USER LAYER │
│ Web dashboard (HTMX-driven) on port 8765 │
│ - Portfolio scorecard │
│ - Per-site detail (CWV, trend, opportunities) │
│ - On-demand test buttons │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ API LAYER (FastAPI) │
│ - GET /performance/ (portfolio view) │
│ - GET /performance/<site_id> (per-site view) │
│ - POST /performance/api/perf/test (trigger test) │
│ - POST /performance/api/perf/sweep (portfolio sweep) │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ TESTING LAYER (Dual Engines) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Sitespeed.io (Docker) │ │
│ │ - Real browser via headless Chrome │ │
│ │ - 3 runs per test, median metrics │ │
│ │ - HAR export (resource waterfall) │ │
│ │ - CWV: LCP, FCP, CLS, TBT, TTFB │ │
│ │ - Duration: ~60s per device │ │
│ └──────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Google PageSpeed Insights (API) │ │
│ │ - Official Lighthouse audit │ │
│ │ - Opportunities (what to fix) │ │
│ │ - Official performance score (0-100) │ │
│ │ - Duration: ~30s per device │ │
│ └──────────────────────────────────────────────────┘ │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ PERSISTENCE LAYER (SQLite) │
│ - perf_runs (test execution records) │
│ - perf_audits (Core Web Vitals metrics) │
│ - perf_opportunities (Lighthouse opportunities) │
│ - perf_resources (HAR resource list) │
└─────────────────────────────────────────────────────────┘
```
## Data Flow: User Clicks "Test Now"
```
1. User clicks "Test Now" button
2. HTMX POST to /performance/api/perf/test
Body: { site_id: 3, url: "https://...",
engines: ["sitespeed", "psi"],
devices: ["mobile", "desktop"] }
3. FastAPI endpoint (performance.py:api_perf_test)
├─ Validate inputs
├─ Spawn background task (ThreadPool)
├─ Return 202 (Accepted) immediately
4. Background task runs src/perf/runner.py:run_full_test()
├─ For each engine in engines:
│ └─ For each device in devices:
│ ├─ If sitespeed:
│ │ └─ Call src/perf/sitespeed.py:run_sitespeed_test()
│ │ ├─ Docker: sitespeedio/sitespeed.io:40.4.0
│ │ ├─ 3 runs (N=3), median metrics
│ │ ├─ Parse HAR: /tmp/sitespeed-output/{run_id}/.../browsertime.har
│ │ ├─ Extract: lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms, page weight
│ │ └─ Approximate score from CWV thresholds
│ │
│ └─ If psi:
│ └─ Call src/perf/psi.py:run_psi_test()
│ ├─ HTTP GET to googleapis.com/pagespeedonline/v5/runPagespeed
│ ├─ Parse Lighthouse audits from response
│ ├─ Extract: opportunities (what to fix + savings)
│ └─ Return official performance_score
├─ For each result: _persist_run() writes to database
│ ├─ perf_runs (engine, device, success, error_message)
│ ├─ perf_audits (performance_score, all CWV metrics)
│ ├─ perf_opportunities (opportunity_key, savings_ms, savings_bytes)
│ └─ perf_resources (url, type, size, load time)
└─ Log completion summary
5. User refreshes dashboard after ~90s
6. FastAPI queries database
├─ _portfolio_rows() — SELECT latest score per site
├─ _site_url_rows() — SELECT latest score per URL
├─ _site_latest_audit() — SELECT full metrics for latest run
├─ _site_trend() — SELECT weekly AVG scores (12 weeks)
├─ _site_opportunities() — SELECT top PSI opportunities
└─ _site_slow_resources() — SELECT top 10 slowest resources
7. Jinja2 templates render HTML with results
├─ performance.html (portfolio scorecard)
└─ performance_site.html (per-site detail with CWV, trend, opps)
8. User sees updated scores, metrics, trend chart, opportunities
```
## Component Breakdown
### 1. Sitespeed.io Testing (src/perf/sitespeed.py)
**Purpose:** Capture real browser performance metrics via headless Chrome
**Process:**
```python
run_sitespeed_test(url="https://rds.ink/endangered", device="mobile")
Generate unique run_id (UUID)
Create output dir: /tmp/sitespeed-output/{run_id}/
Build Docker command:
docker run --rm \
-v /tmp/sitespeed-output:/sitespeed.io \
sitespeedio/sitespeed.io:40.4.0 \
{url} \
--mobile --connectivity 4g \ (if device=="mobile")
--n 3 \ (3 runs, median taken)
--outputFolder /sitespeed.io/{run_id} \
--summary --summary-detail
Wait for Docker container to complete (~60s)
Parse HAR: /sitespeed.io/{run_id}/.../browsertime.har
Extract pages[]._ googleWebVitals (LCP, FCP, CLS, TTFB)
Extract pages[]._cpu.longTasks.totalBlockingTime (TBT)
Compute medians across N=3 runs
Extract resource list (URL, type, size, timing)
Categorise resources (script, stylesheet, image, font, xhr, other)
Calculate page weight breakdown:
total_bytes = sum of all response bodySize
image_bytes = sum where Content-Type contains "image"
js_bytes = sum where Content-Type contains "javascript"
css_bytes = sum where Content-Type contains "css"
font_bytes = sum where Content-Type contains "font"
Approximate performance score:
_approx_score(lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms)
(See Section 2: Score Calculation)
Return dict:
{
"success": true,
"device": "mobile",
"performance_score": 77, Approximated, not Lighthouse
"metrics": {
"lcp_ms": null,
"fcp_ms": 2116,
"cls": 0.0,
"tbt_ms": 1807,
"ttfb_ms": 144,
...
},
"resources": [
{"resource_url": "...", "size_bytes": 12345, ...},
...
]
}
```
**Key note:** Sitespeed v40 does NOT run Lighthouse. Performance score is approximated from CWV thresholds. For official Lighthouse, use PSI.
### 2. PageSpeed Insights Testing (src/perf/psi.py)
**Purpose:** Get official Google Lighthouse audit + opportunities
**Process:**
```python
run_psi_test(url="https://rds.ink/endangered", device="mobile")
Build API request:
GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed
?url={url}&strategy=mobile&category=performance&key={api_key}
Wait for Google to run Lighthouse (~30s)
Parse response.lighthouseResult:
Extract categories.performance.score (0-1) multiply by 100
Extract audits[]:
"largest-contentful-paint" lcp_ms
"first-contentful-paint" fcp_ms
"cumulative-layout-shift" cls
"total-blocking-time" tbt_ms
"interaction-to-next-paint" inp_ms
"server-response-time" ttfb_ms
For each audit with details.type == "opportunity":
Extract display title
Extract overallSavingsMs (potential speed gain)
Extract overallSavingsBytes (potential size reduction)
Store for recommendations
Return dict:
{
"success": true,
"device": "mobile",
"performance_score": 95, Official Lighthouse
"metrics": { ... }, Same structure as sitespeed
"opportunities": [
{
"opportunity_key": "unused-javascript",
"display_label": "Reduce unused JavaScript",
"savings_ms": 400, potential gain
"savings_bytes": 150000
},
...
]
}
```
### 3. Test Orchestration (src/perf/runner.py)
**Purpose:** Run all engines × devices combinations and persist results
```python
run_full_test(
site_id=3,
url="https://rds.ink/endangered",
engines=["sitespeed", "psi"],
devices=["mobile", "desktop"]
)
For engine in ["sitespeed", "psi"]:
For device in ["mobile", "desktop"]:
Run the appropriate test (sitespeed or psi)
Call _persist_run() to write results:
INSERT perf_runs (site_id, url, engine, device, ...)
INSERT perf_audits (performance_score, all metrics)
INSERT perf_opportunities (for each opportunity)
INSERT perf_resources (for each resource)
Log result (success or error)
Return summary:
{
"url": "https://...",
"results": {
"sitespeed_mobile": { "run_id": 1, "score": 77, "success": true },
"sitespeed_desktop": { "run_id": 2, "score": 82, "success": true },
"psi_mobile": { "run_id": 3, "score": 95, "success": true },
"psi_desktop": { "run_id": 4, "score": 93, "success": true }
}
}
```
### 4. Portfolio Sweep (src/perf/batch.py)
**Purpose:** Weekly automated test of all sites × top URLs
**Scheduled:** Monday 04:00 AEST (hard-coded in template)
```python
run_weekly_perf_sweep(db)
For each site in SITES (13 sites):
resolve_url_list(domain):
Get homepage: https://{domain}/
Query ranking_snapshots last 30 days
Get top 5 URLs by impressions
Return: [homepage, url1, url2, url3, url4, url5] (6 URLs max)
For each URL:
Call run_full_test(site_id, url, engines=["sitespeed", "psi"], devices=["mobile", "desktop"])
Wait 5 seconds (inter-URL delay to avoid rate limits)
Log site completion (e.g., "dayboro.au: 6 URLs × 4 runs = 24 tests complete")
Total: ~13 sites × 6 URLs × 4 runs = ~312 tests, ~5 hours
```
### 5. Database Persistence (src/models/perf.py)
Four tables, one per concept:
| Table | Purpose | Key Fields |
|-------|---------|-----------|
| `perf_runs` | Test execution records | site_id, url, engine, device, completed_at, success |
| `perf_audits` | Core Web Vitals metrics | perf_run_id, performance_score, lcp_ms, cls, tbt_ms, etc. |
| `perf_opportunities` | Lighthouse audit opportunities | perf_run_id, opportunity_key, savings_ms, savings_bytes |
| `perf_resources` | HAR resource list | perf_run_id, resource_url, type, size_bytes, duration_ms |
Each perf_run can have:
- 1 perf_audit (metrics)
- 0+ perf_opportunities (if PSI ran)
- 0+ perf_resources (if HAR captured)
### 6. Web Interface (templates/performance.html, performance_site.html)
**Portfolio view** (performance.html):
- Table of all sites
- Latest mobile + desktop scores per site
- Slowest URL per site
- Last tested timestamp
- "Run portfolio sweep now" button (HTMX trigger)
**Per-site view** (performance_site.html):
- CWV metrics for latest run (mobile + desktop side-by-side)
- 12-week trend sparkline chart (two bars per week)
- Top 5 opportunities from PSI
- Top 10 slowest resources from sitespeed
- Per-URL breakdown table with test buttons
## Why Two Engines?
| Aspect | Sitespeed | PSI |
|--------|-----------|-----|
| **What it measures** | Real browser (Browsertime) + HAR waterfall | Official Lighthouse audit |
| **Speed** | ~60s per device | ~30s per device |
| **Score source** | Approximated from CWV thresholds | Official Google Lighthouse |
| **Opportunities** | None (no Lighthouse) | Yes (full audit) |
| **Resource list** | Yes (full HAR) | No (limited) |
| **Use case** | Trend tracking, resource diagnosis | Official benchmarking, opportunities |
**Strategy:** Run both in parallel. Sitespeed gives you the waterfall + trend, PSI gives you official score + what to fix.
---
See also:
- [Score Calculation](02-score-calculation.md) — How the 0-100 score is derived
- [Testing Engines](04-testing-engines.md) — Deep dive into each engine
- [Database Schema](../code-refs/database-schema.md) — All fields, all relationships