Add comprehensive metrics and engines documentation

Complete the documentation suite with: - Deep-dive metrics reference (LCP, FCP, CLS, TBT, TTFB) - Detailed testing engines comparison (Sitespeed vs PSI) - Why TBT is the killer metric for rds.ink - How to fix each metric using Hummingbird - Score differences and when to use each engine Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-14 05:58:12 +10:00
parent 335d9a76e1
commit 76523db177
2 changed files with 659 additions and 0 deletions
--- a/docs/04-testing-engines.md
+++ b/docs/04-testing-engines.md
@@ -0,0 +1,333 @@
+# Testing Engines: Sitespeed vs PSI
+
+Detailed comparison of the two independent engines used to measure performance.
+
+## Overview
+
+The seo-intel system uses **two different testing engines** in parallel:
+
+1. **Sitespeed.io** — Real-browser testing with HAR waterfall
+2. **Google PageSpeed Insights (PSI)** — Official Lighthouse audits
+
+This dual approach captures:
+- Real-user metrics via browser instrumentation (Sitespeed)
+- Official Google scores + recommendations (PSI)
+
+---
+
+## Sitespeed.io
+
+### What It Is
+An open-source performance testing framework that runs a **headless Chrome browser** to measure page performance in real-world conditions.
+
+**Docker image:** `sitespeedio/sitespeed.io:40.4.0`
+
+### How It Works
+
+```
+1. User clicks "Test" for https://rds.ink/endangered
+    ↓
+2. Sitespeed starts Docker container with headless Chrome
+    ↓
+3. Browser loads page (3 times, N=3)
+    - Run 1: LCP=2300ms, FCP=2100ms, TBT=1800ms
+    - Run 2: LCP=2500ms, FCP=2120ms, TBT=1850ms
+    - Run 3: LCP=2400ms, FCP=2110ms, TBT=1780ms
+    ↓
+4. Sitespeed takes the MEDIAN of the three runs
+    - LCP = 2400ms  (middle value)
+    - FCP = 2110ms
+    - TBT = 1800ms
+    ↓
+5. Browser exports HAR (HTTP Archive) file
+    - Contains: every resource, timing, size
+    - JSON file with full waterfall
+    ↓
+6. Sitespeed parses HAR
+    - Extracts CWV metrics
+    - Calculates page weight
+    - Identifies resources
+    ↓
+7. Approximates 0-100 score from thresholds
+    - NOT official Lighthouse (no Lighthouse plugin)
+    - But uses same CWV thresholds as Lighthouse
+```
+
+### Metrics Captured
+
+**Core Web Vitals:**
+- LCP (Largest Contentful Paint)
+- FCP (First Contentful Paint)
+- CLS (Cumulative Layout Shift)
+- TBT (Total Blocking Time)
+- TTFB (Time to First Byte)
+- INP (Interaction to Next Paint) — Not captured in v40
+
+**Page breakdown:**
+- Total page size (bytes)
+- Image bytes
+- JavaScript bytes
+- CSS bytes
+- Font bytes
+- Request count
+
+**Resource list:**
+- Every HTTP request made
+- URL, type (script/image/stylesheet/font/xhr)
+- Size, timing
+
+### Advantages
+- ✅ **Real browser** — Chrome's actual instrumentation
+- ✅ **Full HAR** — See every resource, identify bottlenecks
+- ✅ **Consistent** — Can run anytime, same environment
+- ✅ **Resource timing** — Measure individual script/image load times
+- ✅ **Median metrics** — Run 3 times, use median (more stable than single run)
+
+### Disadvantages
+- ❌ **No Lighthouse** — Score is approximated, not official
+- ❌ **Slower** — 60 seconds per device
+- ❌ **No opportunities** — Doesn't tell you "fix this"
+- ❌ **No INP metric** — v40 doesn't capture Interaction to Next Paint
+- ❌ **Approximate score** — Different algorithm than real Lighthouse
+
+### Device Modes
+
+**Mobile mode** (default):
+```
+--mobile --connectivity 4g
+```
+- Emulates Moto G4 device (412x732 viewport)
+- 4G throttling (simulates real 4G speeds)
+- Mobile user agent
+- **Duration:** ~60s
+
+**Desktop mode:**
+```
+--browsertime.connectivity native --browsertime.viewPort 1366x768 --browsertime.userAgent "Chrome Windows"
+```
+- 1366x768 viewport (typical laptop)
+- Native connectivity (no throttling)
+- Desktop Chrome user agent
+- **Duration:** ~60s
+
+### Output Files
+
+Stored at: `/tmp/sitespeed-output/{run_id}/pages/{domain}/data/`
+
+- `browsertime.har` — Full HTTP Archive (JSON)
+- `browsertime.json` — Detailed metrics (also JSON)
+- `screenShots/` — Video/screenshots of page load
+
+### Score Calculation (Sitespeed v40)
+
+Since Sitespeed v40 doesn't run Lighthouse, it approximates the score:
+
+```python
+# src/perf/sitespeed.py:_approx_score()
+
+_THRESHOLDS = {
+    "lcp":  (2500,  4000),
+    "fcp":  (1800,  3000),
+    "cls":  (0.1,   0.25),
+    "tbt":  (200,   600),
+    "ttfb": (800,   1800),
+}
+
+# For each metric:
+if value <= good:
+    score = 100
+elif value >= poor:
+    score = 30
+else:
+    ratio = (value - good) / (poor - good)
+    score = 100 - (ratio * 70)
+
+# Final score = average of all metric scores
+performance_score = mean([lcp_score, fcp_score, cls_score, tbt_score, ttfb_score])
+```
+
+**Important:** This is NOT the real Lighthouse score. It's an approximation for trend tracking.
+
+---
+
+## Google PageSpeed Insights (PSI)
+
+### What It Is
+Google's official **Lighthouse audit** service. You submit a URL and Google runs Lighthouse against it.
+
+**API endpoint:** `https://www.googleapis.com/pagespeedonline/v5/runPagespeed`
+
+### How It Works
+
+```
+1. seo-intel calls Google's API
+    GET /pagespeedonline/v5/runPagespeed?url=...&strategy=mobile
+    ↓
+2. Google spins up Lighthouse
+    - Full audit: performance, accessibility, best practices, SEO, PWA
+    - We only care about "performance" category
+    ↓
+3. Lighthouse runs and scores 0-100
+    - This is the OFFICIAL score
+    - Uses Google's real Lighthouse algorithm
+    ↓
+4. Lighthouse audit results returned
+    - Performance score (0-100)
+    - All audit items (100+ audits)
+    - Opportunities (what to fix + savings)
+    ↓
+5. seo-intel parses response
+    - Extracts score
+    - Extracts opportunities
+    - Calculates potential savings (ms + bytes)
+```
+
+### Metrics Captured
+
+**Official performance score** (0-100)
+- This is what Google reports
+- Different algorithm than Sitespeed approximation
+
+**Opportunities:**
+- "Reduce unused JavaScript" → 400ms savings, 150KB reduction
+- "Minify CSS" → 50ms, 20KB
+- "Lazy load offscreen images" → 200ms, 500KB
+- "Eliminate render-blocking resources" → 300ms
+- (and ~20 more opportunities)
+
+**Same CWV metrics as Sitespeed:**
+- LCP, FCP, CLS, TBT, TTFB
+
+### Advantages
+- ✅ **Official Lighthouse** — What Google actually scores
+- ✅ **Opportunities** — Specific recommendations on what to fix
+- ✅ **Savings estimates** — How much you'd save per fix
+- ✅ **Comprehensive audit** — 100+ checks across performance, UX, SEO
+- ✅ **Credibility** — "Google says you score 95"
+
+### Disadvantages
+- ❌ **No HAR** — Can't see individual resource timings
+- ❌ **Slower** — 30-90 seconds per device (depends on Google's load)
+- ❌ **Rate-limited** — ~25k tests/day without API key
+- ❌ **Slower infrastructure** — Google's API is slower than local Sitespeed
+- ❌ **No resource breakdown** — Can't identify which JS file is slow
+
+### API Key
+
+Optional. Without it, you get ~25,000 tests/day. With it, you get much higher limits.
+
+**Where to set:**
+- `.env` file: `PSI_API_KEY=...`
+- Or environment variable: `export PSI_API_KEY=...`
+
+**Where to get:**
+1. Google Cloud Console
+2. Create project
+3. Enable PageSpeed Insights API
+4. Create API key
+5. Set in `.env`
+
+If not set, seo-intel still works but you might hit rate limits on very high-volume testing.
+
+### Score Differences from Sitespeed
+
+PSI score ≠ Sitespeed score because they use different algorithms:
+
+| Aspect | Sitespeed Score | PSI Score |
+|--------|---|---|
+| Source | Approximated from thresholds | Official Lighthouse |
+| Algorithm | Linear interpolation | Complex weighting |
+| Weightings | Equal (each metric = 1/5) | Weighted (some metrics matter more) |
+| Audits | None | 100+ audits |
+| Opportunities | None | Yes (what to fix) |
+| Example | 77 (this page) | 95 (estimated) |
+
+**Sitespeed 77** means: directional score, TBT is the killer
+**PSI 95** means: official Google score, page is good but TBT hurts it slightly
+
+---
+
+## Comparison Table
+
+| Feature | Sitespeed | PSI |
+|---------|-----------|-----|
+| **Real browser** | ✅ Headless Chrome | ✅ Lighthouse (Chrome) |
+| **Duration** | 60s | 30-90s |
+| **HAR output** | ✅ Full | ❌ Limited |
+| **Resource timing** | ✅ Per-resource | ❌ Aggregate only |
+| **Official score** | ❌ Approximated | ✅ Real Lighthouse |
+| **Opportunities** | ❌ None | ✅ Full audit |
+| **Savings estimates** | ❌ No | ✅ Yes (ms + bytes) |
+| **CWV metrics** | ✅ LCP, FCP, CLS, TBT, TTFB, INP | ✅ Same |
+| **Cost** | Free (Docker) | Free (25k/day) or API key |
+| **Best for** | Trend tracking, waterfall analysis | Official benchmarking, what to fix |
+
+---
+
+## Which Score Should You Use?
+
+**For trend tracking:**
+Use **Sitespeed score** (77). It's fast, local, consistent. You can test weekly and see if score improves over time.
+
+**For official reporting:**
+Use **PSI score** (95). It's what Google officially scores you. Client-friendly, credible.
+
+**For diagnosing problems:**
+Use **individual metrics** (TBT=1,807ms). This tells you exactly what's broken. Focus on the worst metric first.
+
+---
+
+## How They Work Together
+
+The dual-engine approach gives you:
+
+1. **Sitespeed** finds the bottleneck (TBT=1,807ms is the killer)
+2. **Sitespeed HAR** shows you the resources causing TBT (JavaScript files)
+3. **PSI** tells you how to fix it (opportunities: defer JS, lazy-load, etc.)
+4. **PSI score** tells you the official Google score (95)
+5. **Trends** show if your fixes actually work (score 77 → 88 → 95)
+
+---
+
+## Docker Details (Sitespeed)
+
+### Image Details
+```
+Docker Hub: sitespeedio/sitespeed.io:40.4.0
+Size: ~1.5 GB
+Base: Node.js + Chrome
+Updated: May 2026
+```
+
+### Why v40.4.0?
+- Latest stable version (verified 2026-05-13)
+- Previous versions have bugs or missing metrics
+- Pinned version ensures reproducible results
+
+### How seo-intel Runs It
+
+```bash
+docker run --rm \
+  --shm-size=1g \
+  -v /tmp/sitespeed-output:/sitespeed.io \
+  sitespeedio/sitespeed.io:40.4.0 \
+  https://rds.ink/endangered \
+  --mobile --connectivity 4g \
+  --n 3 \
+  --outputFolder /sitespeed.io/{run_id} \
+  --summary --summary-detail
+```
+
+**Key flags:**
+- `--rm` — Delete container after run (clean up)
+- `--shm-size=1g` — Allocate 1GB shared memory for Chrome
+- `-v` — Mount output directory so we can read the HAR
+- `--n 3` — Run 3 iterations (use median)
+- `--summary` — Print summary to stdout
+
+---
+
+See also:
+- [System Architecture](01-architecture.md) — How engines fit in the larger system
+- [Score Calculation](02-score-calculation.md) — How Sitespeed approximates scores
+- [Metrics Reference](03-metrics-reference.md) — What each metric means