Add comprehensive metrics and engines documentation
Complete the documentation suite with: - Deep-dive metrics reference (LCP, FCP, CLS, TBT, TTFB) - Detailed testing engines comparison (Sitespeed vs PSI) - Why TBT is the killer metric for rds.ink - How to fix each metric using Hummingbird - Score differences and when to use each engine Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
333
docs/04-testing-engines.md
Normal file
333
docs/04-testing-engines.md
Normal file
@@ -0,0 +1,333 @@
|
||||
# Testing Engines: Sitespeed vs PSI
|
||||
|
||||
Detailed comparison of the two independent engines used to measure performance.
|
||||
|
||||
## Overview
|
||||
|
||||
The seo-intel system uses **two different testing engines** in parallel:
|
||||
|
||||
1. **Sitespeed.io** — Real-browser testing with HAR waterfall
|
||||
2. **Google PageSpeed Insights (PSI)** — Official Lighthouse audits
|
||||
|
||||
This dual approach captures:
|
||||
- Real-user metrics via browser instrumentation (Sitespeed)
|
||||
- Official Google scores + recommendations (PSI)
|
||||
|
||||
---
|
||||
|
||||
## Sitespeed.io
|
||||
|
||||
### What It Is
|
||||
An open-source performance testing framework that runs a **headless Chrome browser** to measure page performance in real-world conditions.
|
||||
|
||||
**Docker image:** `sitespeedio/sitespeed.io:40.4.0`
|
||||
|
||||
### How It Works
|
||||
|
||||
```
|
||||
1. User clicks "Test" for https://rds.ink/endangered
|
||||
↓
|
||||
2. Sitespeed starts Docker container with headless Chrome
|
||||
↓
|
||||
3. Browser loads page (3 times, N=3)
|
||||
- Run 1: LCP=2300ms, FCP=2100ms, TBT=1800ms
|
||||
- Run 2: LCP=2500ms, FCP=2120ms, TBT=1850ms
|
||||
- Run 3: LCP=2400ms, FCP=2110ms, TBT=1780ms
|
||||
↓
|
||||
4. Sitespeed takes the MEDIAN of the three runs
|
||||
- LCP = 2400ms (middle value)
|
||||
- FCP = 2110ms
|
||||
- TBT = 1800ms
|
||||
↓
|
||||
5. Browser exports HAR (HTTP Archive) file
|
||||
- Contains: every resource, timing, size
|
||||
- JSON file with full waterfall
|
||||
↓
|
||||
6. Sitespeed parses HAR
|
||||
- Extracts CWV metrics
|
||||
- Calculates page weight
|
||||
- Identifies resources
|
||||
↓
|
||||
7. Approximates 0-100 score from thresholds
|
||||
- NOT official Lighthouse (no Lighthouse plugin)
|
||||
- But uses same CWV thresholds as Lighthouse
|
||||
```
|
||||
|
||||
### Metrics Captured
|
||||
|
||||
**Core Web Vitals:**
|
||||
- LCP (Largest Contentful Paint)
|
||||
- FCP (First Contentful Paint)
|
||||
- CLS (Cumulative Layout Shift)
|
||||
- TBT (Total Blocking Time)
|
||||
- TTFB (Time to First Byte)
|
||||
- INP (Interaction to Next Paint) — Not captured in v40
|
||||
|
||||
**Page breakdown:**
|
||||
- Total page size (bytes)
|
||||
- Image bytes
|
||||
- JavaScript bytes
|
||||
- CSS bytes
|
||||
- Font bytes
|
||||
- Request count
|
||||
|
||||
**Resource list:**
|
||||
- Every HTTP request made
|
||||
- URL, type (script/image/stylesheet/font/xhr)
|
||||
- Size, timing
|
||||
|
||||
### Advantages
|
||||
- ✅ **Real browser** — Chrome's actual instrumentation
|
||||
- ✅ **Full HAR** — See every resource, identify bottlenecks
|
||||
- ✅ **Consistent** — Can run anytime, same environment
|
||||
- ✅ **Resource timing** — Measure individual script/image load times
|
||||
- ✅ **Median metrics** — Run 3 times, use median (more stable than single run)
|
||||
|
||||
### Disadvantages
|
||||
- ❌ **No Lighthouse** — Score is approximated, not official
|
||||
- ❌ **Slower** — 60 seconds per device
|
||||
- ❌ **No opportunities** — Doesn't tell you "fix this"
|
||||
- ❌ **No INP metric** — v40 doesn't capture Interaction to Next Paint
|
||||
- ❌ **Approximate score** — Different algorithm than real Lighthouse
|
||||
|
||||
### Device Modes
|
||||
|
||||
**Mobile mode** (default):
|
||||
```
|
||||
--mobile --connectivity 4g
|
||||
```
|
||||
- Emulates Moto G4 device (412x732 viewport)
|
||||
- 4G throttling (simulates real 4G speeds)
|
||||
- Mobile user agent
|
||||
- **Duration:** ~60s
|
||||
|
||||
**Desktop mode:**
|
||||
```
|
||||
--browsertime.connectivity native --browsertime.viewPort 1366x768 --browsertime.userAgent "Chrome Windows"
|
||||
```
|
||||
- 1366x768 viewport (typical laptop)
|
||||
- Native connectivity (no throttling)
|
||||
- Desktop Chrome user agent
|
||||
- **Duration:** ~60s
|
||||
|
||||
### Output Files
|
||||
|
||||
Stored at: `/tmp/sitespeed-output/{run_id}/pages/{domain}/data/`
|
||||
|
||||
- `browsertime.har` — Full HTTP Archive (JSON)
|
||||
- `browsertime.json` — Detailed metrics (also JSON)
|
||||
- `screenShots/` — Video/screenshots of page load
|
||||
|
||||
### Score Calculation (Sitespeed v40)
|
||||
|
||||
Since Sitespeed v40 doesn't run Lighthouse, it approximates the score:
|
||||
|
||||
```python
|
||||
# src/perf/sitespeed.py:_approx_score()
|
||||
|
||||
_THRESHOLDS = {
|
||||
"lcp": (2500, 4000),
|
||||
"fcp": (1800, 3000),
|
||||
"cls": (0.1, 0.25),
|
||||
"tbt": (200, 600),
|
||||
"ttfb": (800, 1800),
|
||||
}
|
||||
|
||||
# For each metric:
|
||||
if value <= good:
|
||||
score = 100
|
||||
elif value >= poor:
|
||||
score = 30
|
||||
else:
|
||||
ratio = (value - good) / (poor - good)
|
||||
score = 100 - (ratio * 70)
|
||||
|
||||
# Final score = average of all metric scores
|
||||
performance_score = mean([lcp_score, fcp_score, cls_score, tbt_score, ttfb_score])
|
||||
```
|
||||
|
||||
**Important:** This is NOT the real Lighthouse score. It's an approximation for trend tracking.
|
||||
|
||||
---
|
||||
|
||||
## Google PageSpeed Insights (PSI)
|
||||
|
||||
### What It Is
|
||||
Google's official **Lighthouse audit** service. You submit a URL and Google runs Lighthouse against it.
|
||||
|
||||
**API endpoint:** `https://www.googleapis.com/pagespeedonline/v5/runPagespeed`
|
||||
|
||||
### How It Works
|
||||
|
||||
```
|
||||
1. seo-intel calls Google's API
|
||||
GET /pagespeedonline/v5/runPagespeed?url=...&strategy=mobile
|
||||
↓
|
||||
2. Google spins up Lighthouse
|
||||
- Full audit: performance, accessibility, best practices, SEO, PWA
|
||||
- We only care about "performance" category
|
||||
↓
|
||||
3. Lighthouse runs and scores 0-100
|
||||
- This is the OFFICIAL score
|
||||
- Uses Google's real Lighthouse algorithm
|
||||
↓
|
||||
4. Lighthouse audit results returned
|
||||
- Performance score (0-100)
|
||||
- All audit items (100+ audits)
|
||||
- Opportunities (what to fix + savings)
|
||||
↓
|
||||
5. seo-intel parses response
|
||||
- Extracts score
|
||||
- Extracts opportunities
|
||||
- Calculates potential savings (ms + bytes)
|
||||
```
|
||||
|
||||
### Metrics Captured
|
||||
|
||||
**Official performance score** (0-100)
|
||||
- This is what Google reports
|
||||
- Different algorithm than Sitespeed approximation
|
||||
|
||||
**Opportunities:**
|
||||
- "Reduce unused JavaScript" → 400ms savings, 150KB reduction
|
||||
- "Minify CSS" → 50ms, 20KB
|
||||
- "Lazy load offscreen images" → 200ms, 500KB
|
||||
- "Eliminate render-blocking resources" → 300ms
|
||||
- (and ~20 more opportunities)
|
||||
|
||||
**Same CWV metrics as Sitespeed:**
|
||||
- LCP, FCP, CLS, TBT, TTFB
|
||||
|
||||
### Advantages
|
||||
- ✅ **Official Lighthouse** — What Google actually scores
|
||||
- ✅ **Opportunities** — Specific recommendations on what to fix
|
||||
- ✅ **Savings estimates** — How much you'd save per fix
|
||||
- ✅ **Comprehensive audit** — 100+ checks across performance, UX, SEO
|
||||
- ✅ **Credibility** — "Google says you score 95"
|
||||
|
||||
### Disadvantages
|
||||
- ❌ **No HAR** — Can't see individual resource timings
|
||||
- ❌ **Slower** — 30-90 seconds per device (depends on Google's load)
|
||||
- ❌ **Rate-limited** — ~25k tests/day without API key
|
||||
- ❌ **Slower infrastructure** — Google's API is slower than local Sitespeed
|
||||
- ❌ **No resource breakdown** — Can't identify which JS file is slow
|
||||
|
||||
### API Key
|
||||
|
||||
Optional. Without it, you get ~25,000 tests/day. With it, you get much higher limits.
|
||||
|
||||
**Where to set:**
|
||||
- `.env` file: `PSI_API_KEY=...`
|
||||
- Or environment variable: `export PSI_API_KEY=...`
|
||||
|
||||
**Where to get:**
|
||||
1. Google Cloud Console
|
||||
2. Create project
|
||||
3. Enable PageSpeed Insights API
|
||||
4. Create API key
|
||||
5. Set in `.env`
|
||||
|
||||
If not set, seo-intel still works but you might hit rate limits on very high-volume testing.
|
||||
|
||||
### Score Differences from Sitespeed
|
||||
|
||||
PSI score ≠ Sitespeed score because they use different algorithms:
|
||||
|
||||
| Aspect | Sitespeed Score | PSI Score |
|
||||
|--------|---|---|
|
||||
| Source | Approximated from thresholds | Official Lighthouse |
|
||||
| Algorithm | Linear interpolation | Complex weighting |
|
||||
| Weightings | Equal (each metric = 1/5) | Weighted (some metrics matter more) |
|
||||
| Audits | None | 100+ audits |
|
||||
| Opportunities | None | Yes (what to fix) |
|
||||
| Example | 77 (this page) | 95 (estimated) |
|
||||
|
||||
**Sitespeed 77** means: directional score, TBT is the killer
|
||||
**PSI 95** means: official Google score, page is good but TBT hurts it slightly
|
||||
|
||||
---
|
||||
|
||||
## Comparison Table
|
||||
|
||||
| Feature | Sitespeed | PSI |
|
||||
|---------|-----------|-----|
|
||||
| **Real browser** | ✅ Headless Chrome | ✅ Lighthouse (Chrome) |
|
||||
| **Duration** | 60s | 30-90s |
|
||||
| **HAR output** | ✅ Full | ❌ Limited |
|
||||
| **Resource timing** | ✅ Per-resource | ❌ Aggregate only |
|
||||
| **Official score** | ❌ Approximated | ✅ Real Lighthouse |
|
||||
| **Opportunities** | ❌ None | ✅ Full audit |
|
||||
| **Savings estimates** | ❌ No | ✅ Yes (ms + bytes) |
|
||||
| **CWV metrics** | ✅ LCP, FCP, CLS, TBT, TTFB, INP | ✅ Same |
|
||||
| **Cost** | Free (Docker) | Free (25k/day) or API key |
|
||||
| **Best for** | Trend tracking, waterfall analysis | Official benchmarking, what to fix |
|
||||
|
||||
---
|
||||
|
||||
## Which Score Should You Use?
|
||||
|
||||
**For trend tracking:**
|
||||
Use **Sitespeed score** (77). It's fast, local, consistent. You can test weekly and see if score improves over time.
|
||||
|
||||
**For official reporting:**
|
||||
Use **PSI score** (95). It's what Google officially scores you. Client-friendly, credible.
|
||||
|
||||
**For diagnosing problems:**
|
||||
Use **individual metrics** (TBT=1,807ms). This tells you exactly what's broken. Focus on the worst metric first.
|
||||
|
||||
---
|
||||
|
||||
## How They Work Together
|
||||
|
||||
The dual-engine approach gives you:
|
||||
|
||||
1. **Sitespeed** finds the bottleneck (TBT=1,807ms is the killer)
|
||||
2. **Sitespeed HAR** shows you the resources causing TBT (JavaScript files)
|
||||
3. **PSI** tells you how to fix it (opportunities: defer JS, lazy-load, etc.)
|
||||
4. **PSI score** tells you the official Google score (95)
|
||||
5. **Trends** show if your fixes actually work (score 77 → 88 → 95)
|
||||
|
||||
---
|
||||
|
||||
## Docker Details (Sitespeed)
|
||||
|
||||
### Image Details
|
||||
```
|
||||
Docker Hub: sitespeedio/sitespeed.io:40.4.0
|
||||
Size: ~1.5 GB
|
||||
Base: Node.js + Chrome
|
||||
Updated: May 2026
|
||||
```
|
||||
|
||||
### Why v40.4.0?
|
||||
- Latest stable version (verified 2026-05-13)
|
||||
- Previous versions have bugs or missing metrics
|
||||
- Pinned version ensures reproducible results
|
||||
|
||||
### How seo-intel Runs It
|
||||
|
||||
```bash
|
||||
docker run --rm \
|
||||
--shm-size=1g \
|
||||
-v /tmp/sitespeed-output:/sitespeed.io \
|
||||
sitespeedio/sitespeed.io:40.4.0 \
|
||||
https://rds.ink/endangered \
|
||||
--mobile --connectivity 4g \
|
||||
--n 3 \
|
||||
--outputFolder /sitespeed.io/{run_id} \
|
||||
--summary --summary-detail
|
||||
```
|
||||
|
||||
**Key flags:**
|
||||
- `--rm` — Delete container after run (clean up)
|
||||
- `--shm-size=1g` — Allocate 1GB shared memory for Chrome
|
||||
- `-v` — Mount output directory so we can read the HAR
|
||||
- `--n 3` — Run 3 iterations (use median)
|
||||
- `--summary` — Print summary to stdout
|
||||
|
||||
---
|
||||
|
||||
See also:
|
||||
- [System Architecture](01-architecture.md) — How engines fit in the larger system
|
||||
- [Score Calculation](02-score-calculation.md) — How Sitespeed approximates scores
|
||||
- [Metrics Reference](03-metrics-reference.md) — What each metric means
|
||||
Reference in New Issue
Block a user