Files
seo-intel-docs/docs/04-testing-engines.md
help4bis 76523db177 Add comprehensive metrics and engines documentation
Complete the documentation suite with:
- Deep-dive metrics reference (LCP, FCP, CLS, TBT, TTFB)
- Detailed testing engines comparison (Sitespeed vs PSI)
- Why TBT is the killer metric for rds.ink
- How to fix each metric using Hummingbird
- Score differences and when to use each engine

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-14 05:58:12 +10:00

9.8 KiB

Testing Engines: Sitespeed vs PSI

Detailed comparison of the two independent engines used to measure performance.

Overview

The seo-intel system uses two different testing engines in parallel:

  1. Sitespeed.io — Real-browser testing with HAR waterfall
  2. Google PageSpeed Insights (PSI) — Official Lighthouse audits

This dual approach captures:

  • Real-user metrics via browser instrumentation (Sitespeed)
  • Official Google scores + recommendations (PSI)

Sitespeed.io

What It Is

An open-source performance testing framework that runs a headless Chrome browser to measure page performance in real-world conditions.

Docker image: sitespeedio/sitespeed.io:40.4.0

How It Works

1. User clicks "Test" for https://rds.ink/endangered
    ↓
2. Sitespeed starts Docker container with headless Chrome
    ↓
3. Browser loads page (3 times, N=3)
    - Run 1: LCP=2300ms, FCP=2100ms, TBT=1800ms
    - Run 2: LCP=2500ms, FCP=2120ms, TBT=1850ms
    - Run 3: LCP=2400ms, FCP=2110ms, TBT=1780ms
    ↓
4. Sitespeed takes the MEDIAN of the three runs
    - LCP = 2400ms  (middle value)
    - FCP = 2110ms
    - TBT = 1800ms
    ↓
5. Browser exports HAR (HTTP Archive) file
    - Contains: every resource, timing, size
    - JSON file with full waterfall
    ↓
6. Sitespeed parses HAR
    - Extracts CWV metrics
    - Calculates page weight
    - Identifies resources
    ↓
7. Approximates 0-100 score from thresholds
    - NOT official Lighthouse (no Lighthouse plugin)
    - But uses same CWV thresholds as Lighthouse

Metrics Captured

Core Web Vitals:

  • LCP (Largest Contentful Paint)
  • FCP (First Contentful Paint)
  • CLS (Cumulative Layout Shift)
  • TBT (Total Blocking Time)
  • TTFB (Time to First Byte)
  • INP (Interaction to Next Paint) — Not captured in v40

Page breakdown:

  • Total page size (bytes)
  • Image bytes
  • JavaScript bytes
  • CSS bytes
  • Font bytes
  • Request count

Resource list:

  • Every HTTP request made
  • URL, type (script/image/stylesheet/font/xhr)
  • Size, timing

Advantages

  • Real browser — Chrome's actual instrumentation
  • Full HAR — See every resource, identify bottlenecks
  • Consistent — Can run anytime, same environment
  • Resource timing — Measure individual script/image load times
  • Median metrics — Run 3 times, use median (more stable than single run)

Disadvantages

  • No Lighthouse — Score is approximated, not official
  • Slower — 60 seconds per device
  • No opportunities — Doesn't tell you "fix this"
  • No INP metric — v40 doesn't capture Interaction to Next Paint
  • Approximate score — Different algorithm than real Lighthouse

Device Modes

Mobile mode (default):

--mobile --connectivity 4g
  • Emulates Moto G4 device (412x732 viewport)
  • 4G throttling (simulates real 4G speeds)
  • Mobile user agent
  • Duration: ~60s

Desktop mode:

--browsertime.connectivity native --browsertime.viewPort 1366x768 --browsertime.userAgent "Chrome Windows"
  • 1366x768 viewport (typical laptop)
  • Native connectivity (no throttling)
  • Desktop Chrome user agent
  • Duration: ~60s

Output Files

Stored at: /tmp/sitespeed-output/{run_id}/pages/{domain}/data/

  • browsertime.har — Full HTTP Archive (JSON)
  • browsertime.json — Detailed metrics (also JSON)
  • screenShots/ — Video/screenshots of page load

Score Calculation (Sitespeed v40)

Since Sitespeed v40 doesn't run Lighthouse, it approximates the score:

# src/perf/sitespeed.py:_approx_score()

_THRESHOLDS = {
    "lcp":  (2500,  4000),
    "fcp":  (1800,  3000),
    "cls":  (0.1,   0.25),
    "tbt":  (200,   600),
    "ttfb": (800,   1800),
}

# For each metric:
if value <= good:
    score = 100
elif value >= poor:
    score = 30
else:
    ratio = (value - good) / (poor - good)
    score = 100 - (ratio * 70)

# Final score = average of all metric scores
performance_score = mean([lcp_score, fcp_score, cls_score, tbt_score, ttfb_score])

Important: This is NOT the real Lighthouse score. It's an approximation for trend tracking.


Google PageSpeed Insights (PSI)

What It Is

Google's official Lighthouse audit service. You submit a URL and Google runs Lighthouse against it.

API endpoint: https://www.googleapis.com/pagespeedonline/v5/runPagespeed

How It Works

1. seo-intel calls Google's API
    GET /pagespeedonline/v5/runPagespeed?url=...&strategy=mobile
    ↓
2. Google spins up Lighthouse
    - Full audit: performance, accessibility, best practices, SEO, PWA
    - We only care about "performance" category
    ↓
3. Lighthouse runs and scores 0-100
    - This is the OFFICIAL score
    - Uses Google's real Lighthouse algorithm
    ↓
4. Lighthouse audit results returned
    - Performance score (0-100)
    - All audit items (100+ audits)
    - Opportunities (what to fix + savings)
    ↓
5. seo-intel parses response
    - Extracts score
    - Extracts opportunities
    - Calculates potential savings (ms + bytes)

Metrics Captured

Official performance score (0-100)

  • This is what Google reports
  • Different algorithm than Sitespeed approximation

Opportunities:

  • "Reduce unused JavaScript" → 400ms savings, 150KB reduction
  • "Minify CSS" → 50ms, 20KB
  • "Lazy load offscreen images" → 200ms, 500KB
  • "Eliminate render-blocking resources" → 300ms
  • (and ~20 more opportunities)

Same CWV metrics as Sitespeed:

  • LCP, FCP, CLS, TBT, TTFB

Advantages

  • Official Lighthouse — What Google actually scores
  • Opportunities — Specific recommendations on what to fix
  • Savings estimates — How much you'd save per fix
  • Comprehensive audit — 100+ checks across performance, UX, SEO
  • Credibility — "Google says you score 95"

Disadvantages

  • No HAR — Can't see individual resource timings
  • Slower — 30-90 seconds per device (depends on Google's load)
  • Rate-limited — ~25k tests/day without API key
  • Slower infrastructure — Google's API is slower than local Sitespeed
  • No resource breakdown — Can't identify which JS file is slow

API Key

Optional. Without it, you get ~25,000 tests/day. With it, you get much higher limits.

Where to set:

  • .env file: PSI_API_KEY=...
  • Or environment variable: export PSI_API_KEY=...

Where to get:

  1. Google Cloud Console
  2. Create project
  3. Enable PageSpeed Insights API
  4. Create API key
  5. Set in .env

If not set, seo-intel still works but you might hit rate limits on very high-volume testing.

Score Differences from Sitespeed

PSI score ≠ Sitespeed score because they use different algorithms:

Aspect Sitespeed Score PSI Score
Source Approximated from thresholds Official Lighthouse
Algorithm Linear interpolation Complex weighting
Weightings Equal (each metric = 1/5) Weighted (some metrics matter more)
Audits None 100+ audits
Opportunities None Yes (what to fix)
Example 77 (this page) 95 (estimated)

Sitespeed 77 means: directional score, TBT is the killer PSI 95 means: official Google score, page is good but TBT hurts it slightly


Comparison Table

Feature Sitespeed PSI
Real browser Headless Chrome Lighthouse (Chrome)
Duration 60s 30-90s
HAR output Full Limited
Resource timing Per-resource Aggregate only
Official score Approximated Real Lighthouse
Opportunities None Full audit
Savings estimates No Yes (ms + bytes)
CWV metrics LCP, FCP, CLS, TBT, TTFB, INP Same
Cost Free (Docker) Free (25k/day) or API key
Best for Trend tracking, waterfall analysis Official benchmarking, what to fix

Which Score Should You Use?

For trend tracking: Use Sitespeed score (77). It's fast, local, consistent. You can test weekly and see if score improves over time.

For official reporting: Use PSI score (95). It's what Google officially scores you. Client-friendly, credible.

For diagnosing problems: Use individual metrics (TBT=1,807ms). This tells you exactly what's broken. Focus on the worst metric first.


How They Work Together

The dual-engine approach gives you:

  1. Sitespeed finds the bottleneck (TBT=1,807ms is the killer)
  2. Sitespeed HAR shows you the resources causing TBT (JavaScript files)
  3. PSI tells you how to fix it (opportunities: defer JS, lazy-load, etc.)
  4. PSI score tells you the official Google score (95)
  5. Trends show if your fixes actually work (score 77 → 88 → 95)

Docker Details (Sitespeed)

Image Details

Docker Hub: sitespeedio/sitespeed.io:40.4.0
Size: ~1.5 GB
Base: Node.js + Chrome
Updated: May 2026

Why v40.4.0?

  • Latest stable version (verified 2026-05-13)
  • Previous versions have bugs or missing metrics
  • Pinned version ensures reproducible results

How seo-intel Runs It

docker run --rm \
  --shm-size=1g \
  -v /tmp/sitespeed-output:/sitespeed.io \
  sitespeedio/sitespeed.io:40.4.0 \
  https://rds.ink/endangered \
  --mobile --connectivity 4g \
  --n 3 \
  --outputFolder /sitespeed.io/{run_id} \
  --summary --summary-detail

Key flags:

  • --rm — Delete container after run (clean up)
  • --shm-size=1g — Allocate 1GB shared memory for Chrome
  • -v — Mount output directory so we can read the HAR
  • --n 3 — Run 3 iterations (use median)
  • --summary — Print summary to stdout

See also: