Add comprehensive metrics and engines documentation

Complete the documentation suite with: - Deep-dive metrics reference (LCP, FCP, CLS, TBT, TTFB) - Detailed testing engines comparison (Sitespeed vs PSI) - Why TBT is the killer metric for rds.ink - How to fix each metric using Hummingbird - Score differences and when to use each engine Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-14 05:58:12 +10:00
parent 335d9a76e1
commit 76523db177
2 changed files with 659 additions and 0 deletions
--- a/docs/03-metrics-reference.md
+++ b/docs/03-metrics-reference.md
@@ -0,0 +1,326 @@
 # Performance Metrics Reference
 Deep-dive reference for each Core Web Vital metric captured by the seo-intel system.
 ## The Five Metrics
 | Metric | Full Name | Unit | Good | Poor | What Matters |
 |--------|-----------|------|------|------|--------------|
 | **LCP** | Largest Contentful Paint | milliseconds | ≤2,500ms | ≥4,000ms | When main content appears |
 | **FCP** | First Contentful Paint | milliseconds | ≤1,800ms | ≥3,000ms | When ANY content appears |
 | **CLS** | Cumulative Layout Shift | unitless (0-1) | ≤0.1 | ≥0.25 | How much page jumps |
 | **TBT** | Total Blocking Time | milliseconds | ≤200ms | ≥600ms | JS blocking interactions |
 | **TTFB** | Time to First Byte | milliseconds | ≤800ms | ≥1,800ms | Server response speed |
 ---
 ## 1. LCP — Largest Contentful Paint
 ### Definition
 The time when the **largest visible element** (image, heading, paragraph block, video) appears on screen.
 ### Why It Matters
 Users need to see that the page is loading. LCP is the best metric for "when does the user perceive the page is starting to load?"
 ### Thresholds
 - **≤ 2.5 seconds:** Good — user feels the page is responding
 - **2.5 – 4.0 seconds:** Needs improvement
 - **≥ 4.0 seconds:** Poor — user thinks the page is slow/broken
 ### What Affects LCP
 1. **Server response time (TTFB)** — If the server is slow, everything downstream is slow
 2. **Large images/videos above the fold** — Unoptimized media delays LCP
 3. **Render-blocking JavaScript** — `<script>` in `<head>` delays rendering
 4. **Render-blocking CSS** — Large CSS files in `<head>` delay rendering
 5. **Font loading** — Web fonts block text rendering (can use `font-display: swap`)
 ### How to Fix
 1. **Optimise TTFB** (server response)
   - Cache dynamic pages
   - Optimise database queries
   - Use a CDN for static content
 2. **Lazy-load below-the-fold images**
   - Use `loading="lazy"` on `<img>` tags
   - Hummingbird has automatic image lazy-loading
 3. **Defer non-critical JavaScript**
   - Add `defer` attribute to scripts that aren't needed for initial render
   - Move analytics/tracking to the bottom
 4. **Critical CSS inlining** (advanced)
   - Inline the CSS needed for above-the-fold content
   - Defer the rest with `<link rel="preload">`
 ---
 ## 2. FCP — First Contentful Paint
 ### Definition
 The time when the browser paints the **first piece of non-whitespace content** to the screen. This could be:
 - Text
 - An image
 - An SVG
 - A coloured background
 - Anything that's not white
 ### Why It Matters
 FCP is the user's first visual cue that the page is loading. It happens before LCP.
 ### Timeline Relationship
 ```
 0ms: User clicks link
 |
 50ms: Server starts responding
 |
 144ms (TTFB): Browser receives first bytes
 |
 2,116ms (FCP): Browser paints first content (this page)
 |
 2,500ms (LCP): Browser paints largest content
 |
 4,000ms: Page is fully interactive
 ```
 ### Thresholds
 - **≤ 1.8 seconds:** Good
 - **1.8 – 3.0 seconds:** Needs improvement (this page is at 2.1s)
 - **≥ 3.0 seconds:** Poor
 ### What Affects FCP
 1. **TTFB** — Server has to respond first
 2. **HTML parsing** — Browser must parse HTML to find content
 3. **Render-blocking resources** — CSS/JS in `<head>` delay rendering
 4. **Font loading** — If fonts are slow, text doesn't paint until fonts arrive
 ### How to Fix
 Same as LCP:
 1. Optimise TTFB
 2. Defer render-blocking resources
 3. Lazy-load heavy assets
 ---
 ## 3. CLS — Cumulative Layout Shift
 ### Definition
 **Measure of unwanted layout changes** after the page is visually complete.
 Example: You're reading an article, about to click a button, and an ad loads above the button, pushing the button down. You accidentally click the ad instead. That's layout shift.
 ### Why It Matters
 CLS directly impacts **user experience frustration**. Unexpected layout changes are one of the most annoying things on the web.
 ### Measurement
 ```
 CLS = sum of all individual layout shifts
 Each shift is: (fraction of viewport moved) × (distance moved)
 Example:
  - Ad loads, pushes button down by 50px
  - Viewport is 800px tall
  - Shift score = (0.5 × 800) / 800 = 0.5
 If this happens once, CLS = 0.5
 If it happens three times (each 0.5), CLS = 1.5
 ```
 Shifts that happen > 500ms after user input are excluded (they don't surprise the user).
 ### Thresholds
 - **≤ 0.1:** Good — page is stable
 - **0.1 – 0.25:** Needs improvement — some shifts happening
 - **≥ 0.25:** Poor — page is jumpy
 ### rds.ink Status: 0.0 = PERFECT ✓
 This page does NOT shift after load. The images load lazily, product cards maintain their size, nothing pops up. Great job.
 ### What Causes CLS
 1. **Unset image/video dimensions** — Browser doesn't know how much space to reserve
 2. **Ads/widgets loading after page render** — Third-party content shifts layout
 3. **Custom fonts** — Text changes size when font finishes loading
 4. **Embeds/iframes** — External content pushes layout
 5. **Animations that move elements** — Animation changing `top` / `left` / `margin`
 ### How to Fix
 1. **Set dimensions on images**
   ```html
   <img src="..." width="800" height="600" />
   <!-- or in CSS -->
   img { aspect-ratio: 800 / 600; }
   ```
 2. **Reserve space for ads/lazy content**
   ```html
   <div style="width: 300px; height: 250px;">
     <!-- ad will load here -->
   </div>
   ```
 3. **Use `font-display: swap`**
   ```css
   @font-face {
     font-family: 'Custom';
     src: url(...);
     font-display: swap;  /* show fallback first, swap when custom loads */
   }
   ```
 4. **Animations: use `transform` instead of `top`/`left`**
   ```css
   /* GOOD: transform doesn't trigger layout recalc */
   @keyframes slide {
     from { transform: translateX(0); }
     to { transform: translateX(100px); }
   }
   /* BAD: left does trigger layout recalc */
   @keyframes slide {
     from { left: 0; }
     to { left: 100px; }
   }
   ```
 ---
 ## 4. TBT — Total Blocking Time 🔴 **MOST COMMON PROBLEM**
 ### Definition
 **How long JavaScript is executing on the main thread, blocking all user interactions.**
 The browser can only do one thing at a time on the main thread:
 - Parse HTML
 - Execute JavaScript
 - Render CSS
 - Handle user input
 If JavaScript is running, the browser **cannot** respond to clicks, scrolls, or keypresses.
 ### Why It Matters
 A page with high TBT **feels frozen**. User clicks a button, nothing happens for 1+ seconds. The page is technically loaded, but unusable.
 ### Measurement
 ```
 JavaScript task execution timeline:
 0ms ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200ms (long task, blocks for 200ms)
  User clicks button during this
  but browser doesn't respond
 200ms (JavaScript done) → Click now registers (if still there)
 ```
 TBT = sum of all "blocking" time. A task is "blocking" if it takes > 50ms.
 For example:
 - 5 JavaScript tasks, each 100ms = TBT of 5 × 50ms = 250ms blocking
 - 2 JavaScript tasks, each 1,000ms = TBT of 2 × 950ms = 1,900ms blocking
 ### Thresholds
 - **≤ 200ms:** Good — page feels snappy
 - **200 – 600ms:** Needs improvement — noticeable sluggishness
 - **≥ 600ms:** Poor — page feels frozen
 ### rds.ink Status: 1,807ms = CRITICAL 🔴
 The page has **1,800ms of JavaScript execution blocking interactions**. During page load, the user cannot interact for 1.8 seconds.
 This is why the score is 77 instead of 95.
 ### What Causes High TBT
 1. **Large JavaScript bundles** (1.8 MB on this page)
 2. **Synchronous JavaScript execution** — No chunking or deferring
 3. **Too many plugins** — Each plugin adds code to parse/execute
 4. **Unoptimised heavy libraries** — jQuery, older frameworks
 5. **No code-splitting** — Entire app loads upfront instead of as-needed
 ### How to Fix (for WordPress/Hummingbird)
 1. **Defer non-critical JavaScript** (add `defer` attribute)
   - Page renders first
   - Scripts load in background
   - TBT moves to after page is interactive
 2. **Lazy-load heavy plugins** (load only when needed)
   - Gallery/lightbox: load when user clicks product
   - Booking widget: load only on booking page
 3. **Disable unused plugins** (every plugin = more JS)
 4. **Code-split large bundles** (Webpack/bundler feature)
   - Don't load everything upfront
   - Load only what's visible
 5. **Minify/compress JavaScript** (reduce parse time)
 ### Expected Impact of Fixes
 - Current: 1,807ms blocking
 - After defer JS: ~400ms
 - After lazy-load: ~150ms
 - After disabling unused: ~100ms
 - **Target: <200ms**
 ---
 ## 5. TTFB — Time to First Byte
 ### Definition
 The time from when the browser makes a request until the server sends the first byte of the response.
 ```
 User hits link at 0ms
    ↓
 0-50ms: Network latency
    ↓
 50-100ms: Server processes request
    ↓
 100-144ms: Server sends response (for this page)
    ↓ This is TTFB
 144ms: Browser receives first byte
 ```
 ### Why It Matters
 TTFB is a **server-side metric**. It measures how fast your infrastructure is.
 Everything downstream depends on TTFB. You can't optimise FCP if TTFB is 2 seconds.
 ### Thresholds
 - **≤ 0.8 seconds:** Good
 - **0.8 – 1.8 seconds:** Needs improvement
 - **≥ 1.8 seconds:** Poor
 ### rds.ink Status: 144ms = EXCELLENT ✓
 The server responds in 144ms. This is good. Not the bottleneck.
 ### What Affects TTFB
 1. **Server processing time** — Database queries, rendering, etc.
 2. **Network latency** — Distance from client to server
 3. **Server hardware** — CPU/RAM/I/O speed
 4. **Caching** — Is the page/response cached?
 5. **DNS resolution** — Domain lookup time
 ### How to Fix
 1. **Enable page caching** (Hummingbird)
   - Cached pages: TTFB 50ms
   - Uncached: TTFB 144ms
 2. **Optimise database queries** (most common bottleneck)
   - Use database indexing
   - Avoid N+1 queries
   - Query Monitor plugin helps diagnose
 3. **Use a CDN for static assets**
   - CSS, JS, images served from fast edge servers
 4. **Upgrade hosting** (if server is slow)
   - More CPU cores
   - Faster SSD storage
   - Better network connection
 ---
 ## Summary Table
 | Metric | Causes | Quick Fixes |
 |--------|--------|-----------|
 | **LCP** | Slow TTFB, lazy images, render-blocking JS | Defer JS, optimize TTFB |
 | **FCP** | Slow TTFB, render-blocking resources | Defer JS, lazy load |
 | **CLS** | Unset dimensions, ads, fonts, animations | Set dimensions, use `transform` |
 | **TBT** (THE PROBLEM) | Large JS, too many plugins, sync code | Defer JS, lazy-load plugins, disable unused |
 | **TTFB** | Slow server, no cache, slow DB | Enable cache, optimize queries |
 ---
 See also:
 - [Score Calculation](02-score-calculation.md) — How these metrics become a 0-100 score
 - [Testing Engines](04-testing-engines.md) — How metrics are captured
 - [Case Study: rds.ink 77](../case-studies/rds-77-score.md) — Real example with fixes
--- a/docs/04-testing-engines.md
+++ b/docs/04-testing-engines.md
@@ -0,0 +1,333 @@
 # Testing Engines: Sitespeed vs PSI
 Detailed comparison of the two independent engines used to measure performance.
 ## Overview
 The seo-intel system uses **two different testing engines** in parallel:
 1. **Sitespeed.io** — Real-browser testing with HAR waterfall
 2. **Google PageSpeed Insights (PSI)** — Official Lighthouse audits
 This dual approach captures:
 - Real-user metrics via browser instrumentation (Sitespeed)
 - Official Google scores + recommendations (PSI)
 ---
 ## Sitespeed.io
 ### What It Is
 An open-source performance testing framework that runs a **headless Chrome browser** to measure page performance in real-world conditions.
 **Docker image:** `sitespeedio/sitespeed.io:40.4.0`
 ### How It Works
 ```
 1. User clicks "Test" for https://rds.ink/endangered
    ↓
 2. Sitespeed starts Docker container with headless Chrome
    ↓
 3. Browser loads page (3 times, N=3)
    - Run 1: LCP=2300ms, FCP=2100ms, TBT=1800ms
    - Run 2: LCP=2500ms, FCP=2120ms, TBT=1850ms
    - Run 3: LCP=2400ms, FCP=2110ms, TBT=1780ms
    ↓
 4. Sitespeed takes the MEDIAN of the three runs
    - LCP = 2400ms  (middle value)
    - FCP = 2110ms
    - TBT = 1800ms
    ↓
 5. Browser exports HAR (HTTP Archive) file
    - Contains: every resource, timing, size
    - JSON file with full waterfall
    ↓
 6. Sitespeed parses HAR
    - Extracts CWV metrics
    - Calculates page weight
    - Identifies resources
    ↓
 7. Approximates 0-100 score from thresholds
    - NOT official Lighthouse (no Lighthouse plugin)
    - But uses same CWV thresholds as Lighthouse
 ```
 ### Metrics Captured
 **Core Web Vitals:**
 - LCP (Largest Contentful Paint)
 - FCP (First Contentful Paint)
 - CLS (Cumulative Layout Shift)
 - TBT (Total Blocking Time)
 - TTFB (Time to First Byte)
 - INP (Interaction to Next Paint) — Not captured in v40
 **Page breakdown:**
 - Total page size (bytes)
 - Image bytes
 - JavaScript bytes
 - CSS bytes
 - Font bytes
 - Request count
 **Resource list:**
 - Every HTTP request made
 - URL, type (script/image/stylesheet/font/xhr)
 - Size, timing
 ### Advantages
 - ✅ **Real browser** — Chrome's actual instrumentation
 - ✅ **Full HAR** — See every resource, identify bottlenecks
 - ✅ **Consistent** — Can run anytime, same environment
 - ✅ **Resource timing** — Measure individual script/image load times
 - ✅ **Median metrics** — Run 3 times, use median (more stable than single run)
 ### Disadvantages
 - ❌ **No Lighthouse** — Score is approximated, not official
 - ❌ **Slower** — 60 seconds per device
 - ❌ **No opportunities** — Doesn't tell you "fix this"
 - ❌ **No INP metric** — v40 doesn't capture Interaction to Next Paint
 - ❌ **Approximate score** — Different algorithm than real Lighthouse
 ### Device Modes
 **Mobile mode** (default):
 ```
 --mobile --connectivity 4g
 ```
 - Emulates Moto G4 device (412x732 viewport)
 - 4G throttling (simulates real 4G speeds)
 - Mobile user agent
 - **Duration:** ~60s
 **Desktop mode:**
 ```
 --browsertime.connectivity native --browsertime.viewPort 1366x768 --browsertime.userAgent "Chrome Windows"
 ```
 - 1366x768 viewport (typical laptop)
 - Native connectivity (no throttling)
 - Desktop Chrome user agent
 - **Duration:** ~60s
 ### Output Files
 Stored at: `/tmp/sitespeed-output/{run_id}/pages/{domain}/data/`
 - `browsertime.har` — Full HTTP Archive (JSON)
 - `browsertime.json` — Detailed metrics (also JSON)
 - `screenShots/` — Video/screenshots of page load
 ### Score Calculation (Sitespeed v40)
 Since Sitespeed v40 doesn't run Lighthouse, it approximates the score:
 ```python
 # src/perf/sitespeed.py:_approx_score()
 _THRESHOLDS = {
    "lcp":  (2500,  4000),
    "fcp":  (1800,  3000),
    "cls":  (0.1,   0.25),
    "tbt":  (200,   600),
    "ttfb": (800,   1800),
 }
 # For each metric:
 if value <= good:
    score = 100
 elif value >= poor:
    score = 30
 else:
    ratio = (value - good) / (poor - good)
    score = 100 - (ratio * 70)
 # Final score = average of all metric scores
 performance_score = mean([lcp_score, fcp_score, cls_score, tbt_score, ttfb_score])
 ```
 **Important:** This is NOT the real Lighthouse score. It's an approximation for trend tracking.
 ---
 ## Google PageSpeed Insights (PSI)
 ### What It Is
 Google's official **Lighthouse audit** service. You submit a URL and Google runs Lighthouse against it.
 **API endpoint:** `https://www.googleapis.com/pagespeedonline/v5/runPagespeed`
 ### How It Works
 ```
 1. seo-intel calls Google's API
    GET /pagespeedonline/v5/runPagespeed?url=...&strategy=mobile
    ↓
 2. Google spins up Lighthouse
    - Full audit: performance, accessibility, best practices, SEO, PWA
    - We only care about "performance" category
    ↓
 3. Lighthouse runs and scores 0-100
    - This is the OFFICIAL score
    - Uses Google's real Lighthouse algorithm
    ↓
 4. Lighthouse audit results returned
    - Performance score (0-100)
    - All audit items (100+ audits)
    - Opportunities (what to fix + savings)
    ↓
 5. seo-intel parses response
    - Extracts score
    - Extracts opportunities
    - Calculates potential savings (ms + bytes)
 ```
 ### Metrics Captured
 **Official performance score** (0-100)
 - This is what Google reports
 - Different algorithm than Sitespeed approximation
 **Opportunities:**
 - "Reduce unused JavaScript" → 400ms savings, 150KB reduction
 - "Minify CSS" → 50ms, 20KB
 - "Lazy load offscreen images" → 200ms, 500KB
 - "Eliminate render-blocking resources" → 300ms
 - (and ~20 more opportunities)
 **Same CWV metrics as Sitespeed:**
 - LCP, FCP, CLS, TBT, TTFB
 ### Advantages
 - ✅ **Official Lighthouse** — What Google actually scores
 - ✅ **Opportunities** — Specific recommendations on what to fix
 - ✅ **Savings estimates** — How much you'd save per fix
 - ✅ **Comprehensive audit** — 100+ checks across performance, UX, SEO
 - ✅ **Credibility** — "Google says you score 95"
 ### Disadvantages
 - ❌ **No HAR** — Can't see individual resource timings
 - ❌ **Slower** — 30-90 seconds per device (depends on Google's load)
 - ❌ **Rate-limited** — ~25k tests/day without API key
 - ❌ **Slower infrastructure** — Google's API is slower than local Sitespeed
 - ❌ **No resource breakdown** — Can't identify which JS file is slow
 ### API Key
 Optional. Without it, you get ~25,000 tests/day. With it, you get much higher limits.
 **Where to set:**
 - `.env` file: `PSI_API_KEY=...`
 - Or environment variable: `export PSI_API_KEY=...`
 **Where to get:**
 1. Google Cloud Console
 2. Create project
 3. Enable PageSpeed Insights API
 4. Create API key
 5. Set in `.env`
 If not set, seo-intel still works but you might hit rate limits on very high-volume testing.
 ### Score Differences from Sitespeed
 PSI score ≠ Sitespeed score because they use different algorithms:
 | Aspect | Sitespeed Score | PSI Score |
 |--------|---|---|
 | Source | Approximated from thresholds | Official Lighthouse |
 | Algorithm | Linear interpolation | Complex weighting |
 | Weightings | Equal (each metric = 1/5) | Weighted (some metrics matter more) |
 | Audits | None | 100+ audits |
 | Opportunities | None | Yes (what to fix) |
 | Example | 77 (this page) | 95 (estimated) |
 **Sitespeed 77** means: directional score, TBT is the killer
 **PSI 95** means: official Google score, page is good but TBT hurts it slightly
 ---
 ## Comparison Table
 | Feature | Sitespeed | PSI |
 |---------|-----------|-----|
 | **Real browser** | ✅ Headless Chrome | ✅ Lighthouse (Chrome) |
 | **Duration** | 60s | 30-90s |
 | **HAR output** | ✅ Full | ❌ Limited |
 | **Resource timing** | ✅ Per-resource | ❌ Aggregate only |
 | **Official score** | ❌ Approximated | ✅ Real Lighthouse |
 | **Opportunities** | ❌ None | ✅ Full audit |
 | **Savings estimates** | ❌ No | ✅ Yes (ms + bytes) |
 | **CWV metrics** | ✅ LCP, FCP, CLS, TBT, TTFB, INP | ✅ Same |
 | **Cost** | Free (Docker) | Free (25k/day) or API key |
 | **Best for** | Trend tracking, waterfall analysis | Official benchmarking, what to fix |
 ---
 ## Which Score Should You Use?
 **For trend tracking:**
 Use **Sitespeed score** (77). It's fast, local, consistent. You can test weekly and see if score improves over time.
 **For official reporting:**
 Use **PSI score** (95). It's what Google officially scores you. Client-friendly, credible.
 **For diagnosing problems:**
 Use **individual metrics** (TBT=1,807ms). This tells you exactly what's broken. Focus on the worst metric first.
 ---
 ## How They Work Together
 The dual-engine approach gives you:
 1. **Sitespeed** finds the bottleneck (TBT=1,807ms is the killer)
 2. **Sitespeed HAR** shows you the resources causing TBT (JavaScript files)
 3. **PSI** tells you how to fix it (opportunities: defer JS, lazy-load, etc.)
 4. **PSI score** tells you the official Google score (95)
 5. **Trends** show if your fixes actually work (score 77 → 88 → 95)
 ---
 ## Docker Details (Sitespeed)
 ### Image Details
 ```
 Docker Hub: sitespeedio/sitespeed.io:40.4.0
 Size: ~1.5 GB
 Base: Node.js + Chrome
 Updated: May 2026
 ```
 ### Why v40.4.0?
 - Latest stable version (verified 2026-05-13)
 - Previous versions have bugs or missing metrics
 - Pinned version ensures reproducible results
 ### How seo-intel Runs It
 ```bash
 docker run --rm \
  --shm-size=1g \
  -v /tmp/sitespeed-output:/sitespeed.io \
  sitespeedio/sitespeed.io:40.4.0 \
  https://rds.ink/endangered \
  --mobile --connectivity 4g \
  --n 3 \
  --outputFolder /sitespeed.io/{run_id} \
  --summary --summary-detail
 ```
 **Key flags:**
 - `--rm` — Delete container after run (clean up)
 - `--shm-size=1g` — Allocate 1GB shared memory for Chrome
 - `-v` — Mount output directory so we can read the HAR
 - `--n 3` — Run 3 iterations (use median)
 - `--summary` — Print summary to stdout
 ---
 See also:
 - [System Architecture](01-architecture.md) — How engines fit in the larger system
 - [Score Calculation](02-score-calculation.md) — How Sitespeed approximates scores
 - [Metrics Reference](03-metrics-reference.md) — What each metric means