Add comprehensive metrics and engines documentation

Complete the documentation suite with:
- Deep-dive metrics reference (LCP, FCP, CLS, TBT, TTFB)
- Detailed testing engines comparison (Sitespeed vs PSI)
- Why TBT is the killer metric for rds.ink
- How to fix each metric using Hummingbird
- Score differences and when to use each engine

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-05-14 05:58:12 +10:00
parent 335d9a76e1
commit 76523db177
2 changed files with 659 additions and 0 deletions

View File

@@ -0,0 +1,326 @@
# Performance Metrics Reference
Deep-dive reference for each Core Web Vital metric captured by the seo-intel system.
## The Five Metrics
| Metric | Full Name | Unit | Good | Poor | What Matters |
|--------|-----------|------|------|------|--------------|
| **LCP** | Largest Contentful Paint | milliseconds | ≤2,500ms | ≥4,000ms | When main content appears |
| **FCP** | First Contentful Paint | milliseconds | ≤1,800ms | ≥3,000ms | When ANY content appears |
| **CLS** | Cumulative Layout Shift | unitless (0-1) | ≤0.1 | ≥0.25 | How much page jumps |
| **TBT** | Total Blocking Time | milliseconds | ≤200ms | ≥600ms | JS blocking interactions |
| **TTFB** | Time to First Byte | milliseconds | ≤800ms | ≥1,800ms | Server response speed |
---
## 1. LCP — Largest Contentful Paint
### Definition
The time when the **largest visible element** (image, heading, paragraph block, video) appears on screen.
### Why It Matters
Users need to see that the page is loading. LCP is the best metric for "when does the user perceive the page is starting to load?"
### Thresholds
- **≤ 2.5 seconds:** Good — user feels the page is responding
- **2.5 4.0 seconds:** Needs improvement
- **≥ 4.0 seconds:** Poor — user thinks the page is slow/broken
### What Affects LCP
1. **Server response time (TTFB)** — If the server is slow, everything downstream is slow
2. **Large images/videos above the fold** — Unoptimized media delays LCP
3. **Render-blocking JavaScript**`<script>` in `<head>` delays rendering
4. **Render-blocking CSS** — Large CSS files in `<head>` delay rendering
5. **Font loading** — Web fonts block text rendering (can use `font-display: swap`)
### How to Fix
1. **Optimise TTFB** (server response)
- Cache dynamic pages
- Optimise database queries
- Use a CDN for static content
2. **Lazy-load below-the-fold images**
- Use `loading="lazy"` on `<img>` tags
- Hummingbird has automatic image lazy-loading
3. **Defer non-critical JavaScript**
- Add `defer` attribute to scripts that aren't needed for initial render
- Move analytics/tracking to the bottom
4. **Critical CSS inlining** (advanced)
- Inline the CSS needed for above-the-fold content
- Defer the rest with `<link rel="preload">`
---
## 2. FCP — First Contentful Paint
### Definition
The time when the browser paints the **first piece of non-whitespace content** to the screen. This could be:
- Text
- An image
- An SVG
- A coloured background
- Anything that's not white
### Why It Matters
FCP is the user's first visual cue that the page is loading. It happens before LCP.
### Timeline Relationship
```
0ms: User clicks link
|
50ms: Server starts responding
|
144ms (TTFB): Browser receives first bytes
|
2,116ms (FCP): Browser paints first content (this page)
|
2,500ms (LCP): Browser paints largest content
|
4,000ms: Page is fully interactive
```
### Thresholds
- **≤ 1.8 seconds:** Good
- **1.8 3.0 seconds:** Needs improvement (this page is at 2.1s)
- **≥ 3.0 seconds:** Poor
### What Affects FCP
1. **TTFB** — Server has to respond first
2. **HTML parsing** — Browser must parse HTML to find content
3. **Render-blocking resources** — CSS/JS in `<head>` delay rendering
4. **Font loading** — If fonts are slow, text doesn't paint until fonts arrive
### How to Fix
Same as LCP:
1. Optimise TTFB
2. Defer render-blocking resources
3. Lazy-load heavy assets
---
## 3. CLS — Cumulative Layout Shift
### Definition
**Measure of unwanted layout changes** after the page is visually complete.
Example: You're reading an article, about to click a button, and an ad loads above the button, pushing the button down. You accidentally click the ad instead. That's layout shift.
### Why It Matters
CLS directly impacts **user experience frustration**. Unexpected layout changes are one of the most annoying things on the web.
### Measurement
```
CLS = sum of all individual layout shifts
Each shift is: (fraction of viewport moved) × (distance moved)
Example:
- Ad loads, pushes button down by 50px
- Viewport is 800px tall
- Shift score = (0.5 × 800) / 800 = 0.5
If this happens once, CLS = 0.5
If it happens three times (each 0.5), CLS = 1.5
```
Shifts that happen > 500ms after user input are excluded (they don't surprise the user).
### Thresholds
- **≤ 0.1:** Good — page is stable
- **0.1 0.25:** Needs improvement — some shifts happening
- **≥ 0.25:** Poor — page is jumpy
### rds.ink Status: 0.0 = PERFECT ✓
This page does NOT shift after load. The images load lazily, product cards maintain their size, nothing pops up. Great job.
### What Causes CLS
1. **Unset image/video dimensions** — Browser doesn't know how much space to reserve
2. **Ads/widgets loading after page render** — Third-party content shifts layout
3. **Custom fonts** — Text changes size when font finishes loading
4. **Embeds/iframes** — External content pushes layout
5. **Animations that move elements** — Animation changing `top` / `left` / `margin`
### How to Fix
1. **Set dimensions on images**
```html
<img src="..." width="800" height="600" />
<!-- or in CSS -->
img { aspect-ratio: 800 / 600; }
```
2. **Reserve space for ads/lazy content**
```html
<div style="width: 300px; height: 250px;">
<!-- ad will load here -->
</div>
```
3. **Use `font-display: swap`**
```css
@font-face {
font-family: 'Custom';
src: url(...);
font-display: swap; /* show fallback first, swap when custom loads */
}
```
4. **Animations: use `transform` instead of `top`/`left`**
```css
/* GOOD: transform doesn't trigger layout recalc */
@keyframes slide {
from { transform: translateX(0); }
to { transform: translateX(100px); }
}
/* BAD: left does trigger layout recalc */
@keyframes slide {
from { left: 0; }
to { left: 100px; }
}
```
---
## 4. TBT — Total Blocking Time 🔴 **MOST COMMON PROBLEM**
### Definition
**How long JavaScript is executing on the main thread, blocking all user interactions.**
The browser can only do one thing at a time on the main thread:
- Parse HTML
- Execute JavaScript
- Render CSS
- Handle user input
If JavaScript is running, the browser **cannot** respond to clicks, scrolls, or keypresses.
### Why It Matters
A page with high TBT **feels frozen**. User clicks a button, nothing happens for 1+ seconds. The page is technically loaded, but unusable.
### Measurement
```
JavaScript task execution timeline:
0ms ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200ms (long task, blocks for 200ms)
User clicks button during this
but browser doesn't respond
200ms (JavaScript done) → Click now registers (if still there)
```
TBT = sum of all "blocking" time. A task is "blocking" if it takes > 50ms.
For example:
- 5 JavaScript tasks, each 100ms = TBT of 5 × 50ms = 250ms blocking
- 2 JavaScript tasks, each 1,000ms = TBT of 2 × 950ms = 1,900ms blocking
### Thresholds
- **≤ 200ms:** Good — page feels snappy
- **200 600ms:** Needs improvement — noticeable sluggishness
- **≥ 600ms:** Poor — page feels frozen
### rds.ink Status: 1,807ms = CRITICAL 🔴
The page has **1,800ms of JavaScript execution blocking interactions**. During page load, the user cannot interact for 1.8 seconds.
This is why the score is 77 instead of 95.
### What Causes High TBT
1. **Large JavaScript bundles** (1.8 MB on this page)
2. **Synchronous JavaScript execution** — No chunking or deferring
3. **Too many plugins** — Each plugin adds code to parse/execute
4. **Unoptimised heavy libraries** — jQuery, older frameworks
5. **No code-splitting** — Entire app loads upfront instead of as-needed
### How to Fix (for WordPress/Hummingbird)
1. **Defer non-critical JavaScript** (add `defer` attribute)
- Page renders first
- Scripts load in background
- TBT moves to after page is interactive
2. **Lazy-load heavy plugins** (load only when needed)
- Gallery/lightbox: load when user clicks product
- Booking widget: load only on booking page
3. **Disable unused plugins** (every plugin = more JS)
4. **Code-split large bundles** (Webpack/bundler feature)
- Don't load everything upfront
- Load only what's visible
5. **Minify/compress JavaScript** (reduce parse time)
### Expected Impact of Fixes
- Current: 1,807ms blocking
- After defer JS: ~400ms
- After lazy-load: ~150ms
- After disabling unused: ~100ms
- **Target: <200ms**
---
## 5. TTFB — Time to First Byte
### Definition
The time from when the browser makes a request until the server sends the first byte of the response.
```
User hits link at 0ms
0-50ms: Network latency
50-100ms: Server processes request
100-144ms: Server sends response (for this page)
↓ This is TTFB
144ms: Browser receives first byte
```
### Why It Matters
TTFB is a **server-side metric**. It measures how fast your infrastructure is.
Everything downstream depends on TTFB. You can't optimise FCP if TTFB is 2 seconds.
### Thresholds
- **≤ 0.8 seconds:** Good
- **0.8 1.8 seconds:** Needs improvement
- **≥ 1.8 seconds:** Poor
### rds.ink Status: 144ms = EXCELLENT ✓
The server responds in 144ms. This is good. Not the bottleneck.
### What Affects TTFB
1. **Server processing time** — Database queries, rendering, etc.
2. **Network latency** — Distance from client to server
3. **Server hardware** — CPU/RAM/I/O speed
4. **Caching** — Is the page/response cached?
5. **DNS resolution** — Domain lookup time
### How to Fix
1. **Enable page caching** (Hummingbird)
- Cached pages: TTFB 50ms
- Uncached: TTFB 144ms
2. **Optimise database queries** (most common bottleneck)
- Use database indexing
- Avoid N+1 queries
- Query Monitor plugin helps diagnose
3. **Use a CDN for static assets**
- CSS, JS, images served from fast edge servers
4. **Upgrade hosting** (if server is slow)
- More CPU cores
- Faster SSD storage
- Better network connection
---
## Summary Table
| Metric | Causes | Quick Fixes |
|--------|--------|-----------|
| **LCP** | Slow TTFB, lazy images, render-blocking JS | Defer JS, optimize TTFB |
| **FCP** | Slow TTFB, render-blocking resources | Defer JS, lazy load |
| **CLS** | Unset dimensions, ads, fonts, animations | Set dimensions, use `transform` |
| **TBT** (THE PROBLEM) | Large JS, too many plugins, sync code | Defer JS, lazy-load plugins, disable unused |
| **TTFB** | Slow server, no cache, slow DB | Enable cache, optimize queries |
---
See also:
- [Score Calculation](02-score-calculation.md) — How these metrics become a 0-100 score
- [Testing Engines](04-testing-engines.md) — How metrics are captured
- [Case Study: rds.ink 77](../case-studies/rds-77-score.md) — Real example with fixes

333
docs/04-testing-engines.md Normal file
View File

@@ -0,0 +1,333 @@
# Testing Engines: Sitespeed vs PSI
Detailed comparison of the two independent engines used to measure performance.
## Overview
The seo-intel system uses **two different testing engines** in parallel:
1. **Sitespeed.io** — Real-browser testing with HAR waterfall
2. **Google PageSpeed Insights (PSI)** — Official Lighthouse audits
This dual approach captures:
- Real-user metrics via browser instrumentation (Sitespeed)
- Official Google scores + recommendations (PSI)
---
## Sitespeed.io
### What It Is
An open-source performance testing framework that runs a **headless Chrome browser** to measure page performance in real-world conditions.
**Docker image:** `sitespeedio/sitespeed.io:40.4.0`
### How It Works
```
1. User clicks "Test" for https://rds.ink/endangered
2. Sitespeed starts Docker container with headless Chrome
3. Browser loads page (3 times, N=3)
- Run 1: LCP=2300ms, FCP=2100ms, TBT=1800ms
- Run 2: LCP=2500ms, FCP=2120ms, TBT=1850ms
- Run 3: LCP=2400ms, FCP=2110ms, TBT=1780ms
4. Sitespeed takes the MEDIAN of the three runs
- LCP = 2400ms (middle value)
- FCP = 2110ms
- TBT = 1800ms
5. Browser exports HAR (HTTP Archive) file
- Contains: every resource, timing, size
- JSON file with full waterfall
6. Sitespeed parses HAR
- Extracts CWV metrics
- Calculates page weight
- Identifies resources
7. Approximates 0-100 score from thresholds
- NOT official Lighthouse (no Lighthouse plugin)
- But uses same CWV thresholds as Lighthouse
```
### Metrics Captured
**Core Web Vitals:**
- LCP (Largest Contentful Paint)
- FCP (First Contentful Paint)
- CLS (Cumulative Layout Shift)
- TBT (Total Blocking Time)
- TTFB (Time to First Byte)
- INP (Interaction to Next Paint) — Not captured in v40
**Page breakdown:**
- Total page size (bytes)
- Image bytes
- JavaScript bytes
- CSS bytes
- Font bytes
- Request count
**Resource list:**
- Every HTTP request made
- URL, type (script/image/stylesheet/font/xhr)
- Size, timing
### Advantages
-**Real browser** — Chrome's actual instrumentation
-**Full HAR** — See every resource, identify bottlenecks
-**Consistent** — Can run anytime, same environment
-**Resource timing** — Measure individual script/image load times
-**Median metrics** — Run 3 times, use median (more stable than single run)
### Disadvantages
-**No Lighthouse** — Score is approximated, not official
-**Slower** — 60 seconds per device
-**No opportunities** — Doesn't tell you "fix this"
-**No INP metric** — v40 doesn't capture Interaction to Next Paint
-**Approximate score** — Different algorithm than real Lighthouse
### Device Modes
**Mobile mode** (default):
```
--mobile --connectivity 4g
```
- Emulates Moto G4 device (412x732 viewport)
- 4G throttling (simulates real 4G speeds)
- Mobile user agent
- **Duration:** ~60s
**Desktop mode:**
```
--browsertime.connectivity native --browsertime.viewPort 1366x768 --browsertime.userAgent "Chrome Windows"
```
- 1366x768 viewport (typical laptop)
- Native connectivity (no throttling)
- Desktop Chrome user agent
- **Duration:** ~60s
### Output Files
Stored at: `/tmp/sitespeed-output/{run_id}/pages/{domain}/data/`
- `browsertime.har` — Full HTTP Archive (JSON)
- `browsertime.json` — Detailed metrics (also JSON)
- `screenShots/` — Video/screenshots of page load
### Score Calculation (Sitespeed v40)
Since Sitespeed v40 doesn't run Lighthouse, it approximates the score:
```python
# src/perf/sitespeed.py:_approx_score()
_THRESHOLDS = {
"lcp": (2500, 4000),
"fcp": (1800, 3000),
"cls": (0.1, 0.25),
"tbt": (200, 600),
"ttfb": (800, 1800),
}
# For each metric:
if value <= good:
score = 100
elif value >= poor:
score = 30
else:
ratio = (value - good) / (poor - good)
score = 100 - (ratio * 70)
# Final score = average of all metric scores
performance_score = mean([lcp_score, fcp_score, cls_score, tbt_score, ttfb_score])
```
**Important:** This is NOT the real Lighthouse score. It's an approximation for trend tracking.
---
## Google PageSpeed Insights (PSI)
### What It Is
Google's official **Lighthouse audit** service. You submit a URL and Google runs Lighthouse against it.
**API endpoint:** `https://www.googleapis.com/pagespeedonline/v5/runPagespeed`
### How It Works
```
1. seo-intel calls Google's API
GET /pagespeedonline/v5/runPagespeed?url=...&strategy=mobile
2. Google spins up Lighthouse
- Full audit: performance, accessibility, best practices, SEO, PWA
- We only care about "performance" category
3. Lighthouse runs and scores 0-100
- This is the OFFICIAL score
- Uses Google's real Lighthouse algorithm
4. Lighthouse audit results returned
- Performance score (0-100)
- All audit items (100+ audits)
- Opportunities (what to fix + savings)
5. seo-intel parses response
- Extracts score
- Extracts opportunities
- Calculates potential savings (ms + bytes)
```
### Metrics Captured
**Official performance score** (0-100)
- This is what Google reports
- Different algorithm than Sitespeed approximation
**Opportunities:**
- "Reduce unused JavaScript" → 400ms savings, 150KB reduction
- "Minify CSS" → 50ms, 20KB
- "Lazy load offscreen images" → 200ms, 500KB
- "Eliminate render-blocking resources" → 300ms
- (and ~20 more opportunities)
**Same CWV metrics as Sitespeed:**
- LCP, FCP, CLS, TBT, TTFB
### Advantages
-**Official Lighthouse** — What Google actually scores
-**Opportunities** — Specific recommendations on what to fix
-**Savings estimates** — How much you'd save per fix
-**Comprehensive audit** — 100+ checks across performance, UX, SEO
-**Credibility** — "Google says you score 95"
### Disadvantages
-**No HAR** — Can't see individual resource timings
-**Slower** — 30-90 seconds per device (depends on Google's load)
-**Rate-limited** — ~25k tests/day without API key
-**Slower infrastructure** — Google's API is slower than local Sitespeed
-**No resource breakdown** — Can't identify which JS file is slow
### API Key
Optional. Without it, you get ~25,000 tests/day. With it, you get much higher limits.
**Where to set:**
- `.env` file: `PSI_API_KEY=...`
- Or environment variable: `export PSI_API_KEY=...`
**Where to get:**
1. Google Cloud Console
2. Create project
3. Enable PageSpeed Insights API
4. Create API key
5. Set in `.env`
If not set, seo-intel still works but you might hit rate limits on very high-volume testing.
### Score Differences from Sitespeed
PSI score ≠ Sitespeed score because they use different algorithms:
| Aspect | Sitespeed Score | PSI Score |
|--------|---|---|
| Source | Approximated from thresholds | Official Lighthouse |
| Algorithm | Linear interpolation | Complex weighting |
| Weightings | Equal (each metric = 1/5) | Weighted (some metrics matter more) |
| Audits | None | 100+ audits |
| Opportunities | None | Yes (what to fix) |
| Example | 77 (this page) | 95 (estimated) |
**Sitespeed 77** means: directional score, TBT is the killer
**PSI 95** means: official Google score, page is good but TBT hurts it slightly
---
## Comparison Table
| Feature | Sitespeed | PSI |
|---------|-----------|-----|
| **Real browser** | ✅ Headless Chrome | ✅ Lighthouse (Chrome) |
| **Duration** | 60s | 30-90s |
| **HAR output** | ✅ Full | ❌ Limited |
| **Resource timing** | ✅ Per-resource | ❌ Aggregate only |
| **Official score** | ❌ Approximated | ✅ Real Lighthouse |
| **Opportunities** | ❌ None | ✅ Full audit |
| **Savings estimates** | ❌ No | ✅ Yes (ms + bytes) |
| **CWV metrics** | ✅ LCP, FCP, CLS, TBT, TTFB, INP | ✅ Same |
| **Cost** | Free (Docker) | Free (25k/day) or API key |
| **Best for** | Trend tracking, waterfall analysis | Official benchmarking, what to fix |
---
## Which Score Should You Use?
**For trend tracking:**
Use **Sitespeed score** (77). It's fast, local, consistent. You can test weekly and see if score improves over time.
**For official reporting:**
Use **PSI score** (95). It's what Google officially scores you. Client-friendly, credible.
**For diagnosing problems:**
Use **individual metrics** (TBT=1,807ms). This tells you exactly what's broken. Focus on the worst metric first.
---
## How They Work Together
The dual-engine approach gives you:
1. **Sitespeed** finds the bottleneck (TBT=1,807ms is the killer)
2. **Sitespeed HAR** shows you the resources causing TBT (JavaScript files)
3. **PSI** tells you how to fix it (opportunities: defer JS, lazy-load, etc.)
4. **PSI score** tells you the official Google score (95)
5. **Trends** show if your fixes actually work (score 77 → 88 → 95)
---
## Docker Details (Sitespeed)
### Image Details
```
Docker Hub: sitespeedio/sitespeed.io:40.4.0
Size: ~1.5 GB
Base: Node.js + Chrome
Updated: May 2026
```
### Why v40.4.0?
- Latest stable version (verified 2026-05-13)
- Previous versions have bugs or missing metrics
- Pinned version ensures reproducible results
### How seo-intel Runs It
```bash
docker run --rm \
--shm-size=1g \
-v /tmp/sitespeed-output:/sitespeed.io \
sitespeedio/sitespeed.io:40.4.0 \
https://rds.ink/endangered \
--mobile --connectivity 4g \
--n 3 \
--outputFolder /sitespeed.io/{run_id} \
--summary --summary-detail
```
**Key flags:**
- `--rm` — Delete container after run (clean up)
- `--shm-size=1g` — Allocate 1GB shared memory for Chrome
- `-v` — Mount output directory so we can read the HAR
- `--n 3` — Run 3 iterations (use median)
- `--summary` — Print summary to stdout
---
See also:
- [System Architecture](01-architecture.md) — How engines fit in the larger system
- [Score Calculation](02-score-calculation.md) — How Sitespeed approximates scores
- [Metrics Reference](03-metrics-reference.md) — What each metric means