Initial SEO-INTEL documentation: architecture, scoring, code structure

Add comprehensive documentation for the dual-engine performance evaluation system:
- System architecture and data flow
- Score calculation methodology (0-100 approximation from CWV thresholds)
- Detailed metrics reference (LCP, FCP, CLS, TBT, TTFB)
- Testing engines comparison (Sitespeed vs PSI)
- Complete code structure map (file-by-file breakdown)
- Case study: rds.ink 77 score with actionable fixes
- Quick reference guides for interpreting results

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-05-14 05:56:49 +10:00
commit 335d9a76e1
6 changed files with 1385 additions and 0 deletions

329
docs/01-architecture.md Normal file
View File

@@ -0,0 +1,329 @@
# System Architecture
## High-Level Overview
SEO-INTEL is a performance measurement system with three main layers:
```
┌─────────────────────────────────────────────────────────┐
│ USER LAYER │
│ Web dashboard (HTMX-driven) on port 8765 │
│ - Portfolio scorecard │
│ - Per-site detail (CWV, trend, opportunities) │
│ - On-demand test buttons │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ API LAYER (FastAPI) │
│ - GET /performance/ (portfolio view) │
│ - GET /performance/<site_id> (per-site view) │
│ - POST /performance/api/perf/test (trigger test) │
│ - POST /performance/api/perf/sweep (portfolio sweep) │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ TESTING LAYER (Dual Engines) │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Sitespeed.io (Docker) │ │
│ │ - Real browser via headless Chrome │ │
│ │ - 3 runs per test, median metrics │ │
│ │ - HAR export (resource waterfall) │ │
│ │ - CWV: LCP, FCP, CLS, TBT, TTFB │ │
│ │ - Duration: ~60s per device │ │
│ └──────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Google PageSpeed Insights (API) │ │
│ │ - Official Lighthouse audit │ │
│ │ - Opportunities (what to fix) │ │
│ │ - Official performance score (0-100) │ │
│ │ - Duration: ~30s per device │ │
│ └──────────────────────────────────────────────────┘ │
└────────────────────┬────────────────────────────────────┘
┌────────────────────▼────────────────────────────────────┐
│ PERSISTENCE LAYER (SQLite) │
│ - perf_runs (test execution records) │
│ - perf_audits (Core Web Vitals metrics) │
│ - perf_opportunities (Lighthouse opportunities) │
│ - perf_resources (HAR resource list) │
└─────────────────────────────────────────────────────────┘
```
## Data Flow: User Clicks "Test Now"
```
1. User clicks "Test Now" button
2. HTMX POST to /performance/api/perf/test
Body: { site_id: 3, url: "https://...",
engines: ["sitespeed", "psi"],
devices: ["mobile", "desktop"] }
3. FastAPI endpoint (performance.py:api_perf_test)
├─ Validate inputs
├─ Spawn background task (ThreadPool)
├─ Return 202 (Accepted) immediately
4. Background task runs src/perf/runner.py:run_full_test()
├─ For each engine in engines:
│ └─ For each device in devices:
│ ├─ If sitespeed:
│ │ └─ Call src/perf/sitespeed.py:run_sitespeed_test()
│ │ ├─ Docker: sitespeedio/sitespeed.io:40.4.0
│ │ ├─ 3 runs (N=3), median metrics
│ │ ├─ Parse HAR: /tmp/sitespeed-output/{run_id}/.../browsertime.har
│ │ ├─ Extract: lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms, page weight
│ │ └─ Approximate score from CWV thresholds
│ │
│ └─ If psi:
│ └─ Call src/perf/psi.py:run_psi_test()
│ ├─ HTTP GET to googleapis.com/pagespeedonline/v5/runPagespeed
│ ├─ Parse Lighthouse audits from response
│ ├─ Extract: opportunities (what to fix + savings)
│ └─ Return official performance_score
├─ For each result: _persist_run() writes to database
│ ├─ perf_runs (engine, device, success, error_message)
│ ├─ perf_audits (performance_score, all CWV metrics)
│ ├─ perf_opportunities (opportunity_key, savings_ms, savings_bytes)
│ └─ perf_resources (url, type, size, load time)
└─ Log completion summary
5. User refreshes dashboard after ~90s
6. FastAPI queries database
├─ _portfolio_rows() — SELECT latest score per site
├─ _site_url_rows() — SELECT latest score per URL
├─ _site_latest_audit() — SELECT full metrics for latest run
├─ _site_trend() — SELECT weekly AVG scores (12 weeks)
├─ _site_opportunities() — SELECT top PSI opportunities
└─ _site_slow_resources() — SELECT top 10 slowest resources
7. Jinja2 templates render HTML with results
├─ performance.html (portfolio scorecard)
└─ performance_site.html (per-site detail with CWV, trend, opps)
8. User sees updated scores, metrics, trend chart, opportunities
```
## Component Breakdown
### 1. Sitespeed.io Testing (src/perf/sitespeed.py)
**Purpose:** Capture real browser performance metrics via headless Chrome
**Process:**
```python
run_sitespeed_test(url="https://rds.ink/endangered", device="mobile")
Generate unique run_id (UUID)
Create output dir: /tmp/sitespeed-output/{run_id}/
Build Docker command:
docker run --rm \
-v /tmp/sitespeed-output:/sitespeed.io \
sitespeedio/sitespeed.io:40.4.0 \
{url} \
--mobile --connectivity 4g \ (if device=="mobile")
--n 3 \ (3 runs, median taken)
--outputFolder /sitespeed.io/{run_id} \
--summary --summary-detail
Wait for Docker container to complete (~60s)
Parse HAR: /sitespeed.io/{run_id}/.../browsertime.har
Extract pages[]._ googleWebVitals (LCP, FCP, CLS, TTFB)
Extract pages[]._cpu.longTasks.totalBlockingTime (TBT)
Compute medians across N=3 runs
Extract resource list (URL, type, size, timing)
Categorise resources (script, stylesheet, image, font, xhr, other)
Calculate page weight breakdown:
total_bytes = sum of all response bodySize
image_bytes = sum where Content-Type contains "image"
js_bytes = sum where Content-Type contains "javascript"
css_bytes = sum where Content-Type contains "css"
font_bytes = sum where Content-Type contains "font"
Approximate performance score:
_approx_score(lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms)
(See Section 2: Score Calculation)
Return dict:
{
"success": true,
"device": "mobile",
"performance_score": 77, Approximated, not Lighthouse
"metrics": {
"lcp_ms": null,
"fcp_ms": 2116,
"cls": 0.0,
"tbt_ms": 1807,
"ttfb_ms": 144,
...
},
"resources": [
{"resource_url": "...", "size_bytes": 12345, ...},
...
]
}
```
**Key note:** Sitespeed v40 does NOT run Lighthouse. Performance score is approximated from CWV thresholds. For official Lighthouse, use PSI.
### 2. PageSpeed Insights Testing (src/perf/psi.py)
**Purpose:** Get official Google Lighthouse audit + opportunities
**Process:**
```python
run_psi_test(url="https://rds.ink/endangered", device="mobile")
Build API request:
GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed
?url={url}&strategy=mobile&category=performance&key={api_key}
Wait for Google to run Lighthouse (~30s)
Parse response.lighthouseResult:
Extract categories.performance.score (0-1) multiply by 100
Extract audits[]:
"largest-contentful-paint" lcp_ms
"first-contentful-paint" fcp_ms
"cumulative-layout-shift" cls
"total-blocking-time" tbt_ms
"interaction-to-next-paint" inp_ms
"server-response-time" ttfb_ms
For each audit with details.type == "opportunity":
Extract display title
Extract overallSavingsMs (potential speed gain)
Extract overallSavingsBytes (potential size reduction)
Store for recommendations
Return dict:
{
"success": true,
"device": "mobile",
"performance_score": 95, Official Lighthouse
"metrics": { ... }, Same structure as sitespeed
"opportunities": [
{
"opportunity_key": "unused-javascript",
"display_label": "Reduce unused JavaScript",
"savings_ms": 400, potential gain
"savings_bytes": 150000
},
...
]
}
```
### 3. Test Orchestration (src/perf/runner.py)
**Purpose:** Run all engines × devices combinations and persist results
```python
run_full_test(
site_id=3,
url="https://rds.ink/endangered",
engines=["sitespeed", "psi"],
devices=["mobile", "desktop"]
)
For engine in ["sitespeed", "psi"]:
For device in ["mobile", "desktop"]:
Run the appropriate test (sitespeed or psi)
Call _persist_run() to write results:
INSERT perf_runs (site_id, url, engine, device, ...)
INSERT perf_audits (performance_score, all metrics)
INSERT perf_opportunities (for each opportunity)
INSERT perf_resources (for each resource)
Log result (success or error)
Return summary:
{
"url": "https://...",
"results": {
"sitespeed_mobile": { "run_id": 1, "score": 77, "success": true },
"sitespeed_desktop": { "run_id": 2, "score": 82, "success": true },
"psi_mobile": { "run_id": 3, "score": 95, "success": true },
"psi_desktop": { "run_id": 4, "score": 93, "success": true }
}
}
```
### 4. Portfolio Sweep (src/perf/batch.py)
**Purpose:** Weekly automated test of all sites × top URLs
**Scheduled:** Monday 04:00 AEST (hard-coded in template)
```python
run_weekly_perf_sweep(db)
For each site in SITES (13 sites):
resolve_url_list(domain):
Get homepage: https://{domain}/
Query ranking_snapshots last 30 days
Get top 5 URLs by impressions
Return: [homepage, url1, url2, url3, url4, url5] (6 URLs max)
For each URL:
Call run_full_test(site_id, url, engines=["sitespeed", "psi"], devices=["mobile", "desktop"])
Wait 5 seconds (inter-URL delay to avoid rate limits)
Log site completion (e.g., "dayboro.au: 6 URLs × 4 runs = 24 tests complete")
Total: ~13 sites × 6 URLs × 4 runs = ~312 tests, ~5 hours
```
### 5. Database Persistence (src/models/perf.py)
Four tables, one per concept:
| Table | Purpose | Key Fields |
|-------|---------|-----------|
| `perf_runs` | Test execution records | site_id, url, engine, device, completed_at, success |
| `perf_audits` | Core Web Vitals metrics | perf_run_id, performance_score, lcp_ms, cls, tbt_ms, etc. |
| `perf_opportunities` | Lighthouse audit opportunities | perf_run_id, opportunity_key, savings_ms, savings_bytes |
| `perf_resources` | HAR resource list | perf_run_id, resource_url, type, size_bytes, duration_ms |
Each perf_run can have:
- 1 perf_audit (metrics)
- 0+ perf_opportunities (if PSI ran)
- 0+ perf_resources (if HAR captured)
### 6. Web Interface (templates/performance.html, performance_site.html)
**Portfolio view** (performance.html):
- Table of all sites
- Latest mobile + desktop scores per site
- Slowest URL per site
- Last tested timestamp
- "Run portfolio sweep now" button (HTMX trigger)
**Per-site view** (performance_site.html):
- CWV metrics for latest run (mobile + desktop side-by-side)
- 12-week trend sparkline chart (two bars per week)
- Top 5 opportunities from PSI
- Top 10 slowest resources from sitespeed
- Per-URL breakdown table with test buttons
## Why Two Engines?
| Aspect | Sitespeed | PSI |
|--------|-----------|-----|
| **What it measures** | Real browser (Browsertime) + HAR waterfall | Official Lighthouse audit |
| **Speed** | ~60s per device | ~30s per device |
| **Score source** | Approximated from CWV thresholds | Official Google Lighthouse |
| **Opportunities** | None (no Lighthouse) | Yes (full audit) |
| **Resource list** | Yes (full HAR) | No (limited) |
| **Use case** | Trend tracking, resource diagnosis | Official benchmarking, opportunities |
**Strategy:** Run both in parallel. Sitespeed gives you the waterfall + trend, PSI gives you official score + what to fix.
---
See also:
- [Score Calculation](02-score-calculation.md) — How the 0-100 score is derived
- [Testing Engines](04-testing-engines.md) — Deep dive into each engine
- [Database Schema](../code-refs/database-schema.md) — All fields, all relationships

View File

@@ -0,0 +1,268 @@
# Performance Score Calculation
## The Formula
```
Performance Score = Average of five metric scores (0-100)
Score = (LCP_score + FCP_score + CLS_score + TBT_score + TTFB_score) / 5
where each metric_score is calculated from thresholds:
if metric ≤ good_threshold → metric_score = 100
if metric ≥ poor_threshold → metric_score = 30
if between → metric_score = 100 - ((metric - good) / (poor - good)) × 70
```
## Example: rds.ink/endangered = 77
From the database (sitespeed mobile run on 2026-05-13):
```
LCP: NULL → skipped (no data)
FCP: 2,116ms → score calculation:
good=1,800 poor=3,000
2,116 is between good and poor
ratio = (2116 - 1800) / (3000 - 1800) = 316 / 1200 = 0.263
score = 100 - (0.263 × 70) = 100 - 18.4 = 82 points ✓
CLS: 0.0 → score = 100 (well below good threshold of 0.1) ✓
TBT: 1,807ms → score calculation:
good=200 poor=600
1,807 >> poor threshold
ratio = (1807 - 200) / (600 - 200) = 1607 / 400 = 4.02
Since ratio > 1: score = capped at 30 points ✗ CRITICAL
TTFB: 144ms → score = 100 (well below good threshold of 800ms) ✓
Average = (82 + 100 + 30 + 100) / 4 = 78 ≈ 77 (database value)
↑ (rounding)
```
**Bottom line:** TBT (Total Blocking Time) of 1,807ms is **9 times worse** than the 200ms threshold. This single metric alone drops the score from ~90 → 77.
## Thresholds (Hard-Coded)
**File:** `/home/help4bis/seo-intel/src/perf/sitespeed.py` lines 5360
```python
_THRESHOLDS = {
# (good_max, poor_min)
"lcp": (2500, 4000), # ms
"fcp": (1800, 3000), # ms
"cls": (0.1, 0.25), # unitless
"tbt": (200, 600), # ms
"ttfb": (800, 1800), # ms
}
```
These thresholds are **based on Google's Lighthouse 10 scoring rubric**. They're not arbitrary — they're what Google uses to score web performance.
## Metric-by-Metric Breakdown
### 1. LCP (Largest Contentful Paint)
**What it measures:** How long before the largest visible element (image, heading, paragraph) appears on screen.
**Why it matters:** Users need to see that something is happening.
**Thresholds:**
- **Good:** ≤ 2,500ms (2.5 seconds)
- **Poor:** ≥ 4,000ms (4 seconds)
**rds.ink status:** Not measured (NULL)
**Typical fixes:**
- Optimize server response time (TTFB)
- Defer non-critical JavaScript
- Lazy-load images
- Use a CDN for images
---
### 2. FCP (First Contentful Paint)
**What it measures:** How long before ANY content (text, image, non-white background) appears.
**Why it matters:** The first visual indication that the page is loading.
**Thresholds:**
- **Good:** ≤ 1,800ms (1.8 seconds)
- **Poor:** ≥ 3,000ms (3 seconds)
**rds.ink status:** 2,116ms = AMBER (82/100)
The page shows content after 2.1 seconds, which is acceptable but slower than ideal. Caused by deferred script execution blocking rendering.
**Typical fixes:**
- Reduce server response time (TTFB)
- Defer non-critical JavaScript
- Inline critical CSS
- Reduce DOM size
---
### 3. CLS (Cumulative Layout Shift)
**What it measures:** How much the page layout jumps around after initial load.
**Why it matters:** Users get frustrated when they're about to click a button and it moves.
**Thresholds:**
- **Good:** ≤ 0.1 (10% of viewport)
- **Poor:** ≥ 0.25 (25% of viewport)
**rds.ink status:** 0.0 = PERFECT ✓
The page does NOT move after load. Great job. This metric is not the problem.
**Typical fixes:**
- Set explicit dimensions on images
- Avoid inserting content above existing content
- Use transform animations instead of position changes
---
### 4. TBT (Total Blocking Time) 🔴 **THE KILLER METRIC**
**What it measures:** How long JavaScript blocks the main thread, preventing the browser from responding to user input (clicks, scrolls, etc.).
**Why it matters:** A page with 1.8 seconds of TBT feels frozen to the user.
**Thresholds:**
- **Good:** ≤ 200ms (0.2 seconds)
- **Poor:** ≥ 600ms (0.6 seconds)
**rds.ink status:** 1,807ms = CRITICAL ❌
The page's JavaScript takes **1.8 seconds** to execute after initial render. During this time:
- User clicks "Add to cart" → Nothing happens
- User tries to scroll → Page is frozen
- User tries to open menu → Unresponsive
**Impact on score:** 30/100 points (single worst metric)
**Root cause:** Likely WooCommerce plugins, Elementor scripts, and lazy-loaded gallery libraries (Lightbox, PhotoSwipe, Slick, etc.) all executing simultaneously.
**Typical fixes (in priority order):**
1. **Defer non-critical JavaScript** (add `defer` attribute to `<script>` tags)
2. **Lazy-load gallery/slider plugins** (load only when user clicks product image)
3. **Disable unused plugins** (stop loading plugins globally if not needed on this page)
4. **Code-split heavy libraries** (load only what's visible above the fold)
5. **Minify/combine JavaScript** (reduce parsing overhead)
---
### 5. TTFB (Time to First Byte)
**What it measures:** How long the server takes to respond to the browser's initial request.
**Why it matters:** Everything else depends on this. You can't optimize what you haven't received yet.
**Thresholds:**
- **Good:** ≤ 800ms
- **Poor:** ≥ 1,800ms
**rds.ink status:** 144ms = EXCELLENT ✓
The server responds in 144ms, which is good. This is NOT the bottleneck.
**Typical fixes:**
- Optimise server-side code (database queries, etc.)
- Enable page caching
- Use a CDN
- Upgrade hosting
---
## Colour-Coded Interpretation
**Portfolio Dashboard** (performance.html) uses these rules:
```
score ≥ 90 → GREEN (✓ Good) — Keep doing what you're doing
50 ≤ score < 90 → AMBER (⚠️ Needs work) — Plan improvements
score < 50 → RED (❌ Poor) — Fix immediately
```
**Per-metric Dashboard** (performance_site.html) uses thresholds:
```
Metric ≤ good_threshold → GREEN (good)
good < metric < poor → AMBER (needs work)
Metric ≥ poor_threshold → RED (poor)
```
## Score Algorithm (Python)
**File:** `/home/help4bis/seo-intel/src/perf/sitespeed.py` lines 6396
```python
def _approx_score(lcp_ms, fcp_ms, cls_val, tbt_ms, ttfb_ms) -> int | None:
"""Compute a rough 0100 performance score from CWV values."""
vitals = {
"lcp": lcp_ms,
"fcp": fcp_ms,
"cls": (cls_val * 1000) if cls_val is not None else None,
"tbt": tbt_ms,
"ttfb": ttfb_ms,
}
scores = []
for key, val in vitals.items():
if val is None:
continue # skip nulls (e.g., LCP if not measured)
good, poor = _THRESHOLDS[key]
if val <= good:
scores.append(100)
elif val >= poor:
scores.append(30)
else:
# linear interpolation
ratio = (val - good) / (poor - good)
scores.append(int(100 - ratio * 70))
return int(statistics.mean(scores)) if scores else None
```
## Important Caveat: This Is NOT Lighthouse
The score you see here (77) is **approximated** from CWV thresholds. It's **not** the official Google Lighthouse score.
**Why the approximation?**
- Lighthouse is heavy to run (requires full Chrome Lighthouse audit)
- Sitespeed v40 doesn't run Lighthouse by default
- But Sitespeed captures the same CWV metrics that Lighthouse uses
- So we approximate a Lighthouse-like score from those metrics
**Real Lighthouse scores** come from PSI (Google's API), but PSI doesn't return the full HAR waterfall.
**Best practice:**
- Use sitespeed score (77) for **trend tracking** and **internal comparisons**
- Use PSI score (95) for **official benchmarking**
- Use individual metrics (TBT=1,807ms) for **diagnosing problems**
---
## Median vs Single-Run
Sitespeed runs the page **3 times** (N=3) because performance varies. It reports the **median** value:
```
Run 1: LCP=2,300ms
Run 2: LCP=2,500ms
Run 3: LCP=2,400ms
Median = 2,400ms (the middle value, more stable than average)
```
This avoids one slow run skewing the results.
---
See also:
- [Metrics Reference](03-metrics-reference.md) — Deeper dive into each metric
- [Testing Engines](04-testing-engines.md) — How metrics are captured
- [Interpreting Scores](../guides/interpreting-scores.md) — What to do with your score