Files
seo-intel-docs/code-refs/file-structure.md
help4bis 335d9a76e1 Initial SEO-INTEL documentation: architecture, scoring, code structure
Add comprehensive documentation for the dual-engine performance evaluation system:
- System architecture and data flow
- Score calculation methodology (0-100 approximation from CWV thresholds)
- Detailed metrics reference (LCP, FCP, CLS, TBT, TTFB)
- Testing engines comparison (Sitespeed vs PSI)
- Complete code structure map (file-by-file breakdown)
- Case study: rds.ink 77 score with actionable fixes
- Quick reference guides for interpreting results

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-14 05:56:49 +10:00

12 KiB
Raw Blame History

Code Structure Map

Complete file-by-file breakdown of the seo-intel repository.

Directory Layout

/home/help4bis/seo-intel/
├── README.md                    # Project overview (v1.1.0)
├── pyproject.toml              # Python project config (dependencies, build)
├── requirements.txt            # Python package list
├── run.sh                       # Launch script (runs main.py)
├── .env                         # Secrets: PSI_API_KEY, DB path, etc.
│
├── src/                         # Python package
│   ├── __init__.py
│   ├── main.py                  # FastAPI app entry point
│   ├── config.py                # Settings, site list (SITES config)
│   ├── db.py                    # SQLAlchemy setup, migrations, session factory
│   │
│   ├── models/
│   │   ├── __init__.py
│   │   ├── perf.py              # ORM models: PerfRun, PerfAudit, PerfOpportunity, PerfResource
│   │   ├── site.py              # Site model (name, domain, priority)
│   │   ├── ranking.py           # Ranking snapshot model (SEO keyword rankings)
│   │   └── ...                  # Other models (not perf-related)
│   │
│   ├── routers/
│   │   ├── __init__.py
│   │   ├── performance.py       # GET /performance/, /performance/<site_id>, POST /api/perf/test, /api/perf/sweep
│   │   ├── dashboard.py         # GET / (main dashboard)
│   │   ├── keywords.py          # Keyword ranking pages
│   │   └── ...                  # Other routers (not perf-related)
│   │
│   ├── perf/                    # Performance testing engines
│   │   ├── __init__.py
│   │   ├── runner.py            # Orchestrator: run_full_test() — runs engines × devices
│   │   ├── sitespeed.py         # Sitespeed.io Docker wrapper + HAR parser
│   │   ├── psi.py               # Google PageSpeed Insights API client
│   │   └── batch.py             # Weekly sweep logic
│   │
│   ├── playbook/                # SEO playbook generation (not perf-related)
│   │   ├── __init__.py
│   │   ├── rules.py
│   │   └── llm.py
│   │
│   └── ...                      # Other modules (keyword analysis, etc.)
│
├── templates/                   # Jinja2 HTML templates
│   ├── base.html                # Base template (nav, styling)
│   ├── performance.html         # Portfolio scorecard
│   ├── performance_site.html    # Per-site detail dashboard
│   ├── dashboard.html           # Main dashboard
│   └── ...                      # Other templates
│
├── data/
│   └── seo-intel.db            # SQLite database (perf_runs, perf_audits, etc.)
│
├── docs/                        # Documentation (this repo)
│
└── ops/                         # Operations scripts
    ├── schema.sql              # Database schema
    └── ...

Performance System Files (Perf Tier)

src/routers/performance.py

Purpose: FastAPI routes for the performance dashboard

Key functions:

  • performance_home(request, db)GET /performance/ → portfolio scorecard
  • performance_site(site_id, request, db)GET /performance/<site_id> → per-site detail
  • api_perf_test(body, background_tasks, db)POST /api/perf/test → trigger single URL test
  • api_perf_sweep(background_tasks)POST /api/perf/sweep → trigger portfolio sweep
  • _portfolio_rows(db) — SQL: latest scores per site × device
  • _site_url_rows(db, site_id) — SQL: latest score per URL
  • _site_latest_audit(db, site_id, device) — SQL: full metrics for latest run
  • _site_trend(db, site_id, weeks) — SQL: weekly AVG scores (12 weeks)
  • _site_opportunities(db, site_id, device) — SQL: top PSI opportunities
  • _site_slow_resources(db, site_id) — SQL: top 10 slowest resources

Key imports:

from fastapi import APIRouter, BackgroundTasks, Depends
from sqlalchemy import text
from fastapi.templating import Jinja2Templates
from .perf.runner import run_full_test
from .perf.batch import run_weekly_perf_sweep

Size: ~545 lines


src/perf/runner.py

Purpose: Orchestrates test runs across engines and devices

Key functions:

  • run_full_test(site_id, url, db, engines, devices) — Main orchestrator
    • Loops: for engine in engines: for device in devices:
    • Calls appropriate engine (sitespeed or psi)
    • Persists each result via _persist_run()
    • Returns summary dict
  • _persist_run(db, site_id, url, engine, result) — Writes one test result to database
    • Inserts: perf_runs (1), perf_audits (1), perf_opportunities (0+), perf_resources (0+)
    • Commits transaction

Key imports:

from sqlalchemy.orm import Session
from .models.perf import PerfRun, PerfAudit, PerfOpportunity, PerfResource
from .sitespeed import run_sitespeed_test
from .psi import run_psi_test

Size: ~200 lines


src/perf/sitespeed.py

Purpose: Wraps sitespeed.io Docker container, parses HAR output

Key functions:

  • run_sitespeed_test(url, device) — Execute sitespeed in Docker
    • Builds Docker command with device-specific args (--mobile vs desktop UA)
    • Runs docker run sitespeedio/sitespeed.io:40.4.0 {url} --n 3 ...
    • Waits for output (60s)
    • Calls _parse_har() to extract metrics
    • Calls _approx_score() to calculate performance score
    • Returns: success, performance_score, metrics, resources
  • _parse_har(har_path) — Parse /tmp/sitespeed-output/{run_id}/.../browsertime.har
    • Extracts _googleWebVitals from pages[] (LCP, FCP, CLS, TTFB)
    • Extracts _cpu.longTasks.totalBlockingTime from pages[] (TBT)
    • Sums resource sizes by type (image, script, stylesheet, font)
    • Returns: metrics dict, resources list
  • _approx_score(lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms) — Calculate 0-100 score
    • Uses _THRESHOLDS (lines 5360)
    • Linear interpolation between good/poor for each metric
    • Returns: int(mean(all_metric_scores))
  • _guess_resource_type(url, content_type) — Classify resource (script, image, etc.)

Key constants:

  • SITESPEED_IMAGE = "sitespeedio/sitespeed.io:40.4.0" (pinned version)
  • OUTPUT_BASE = Path("/tmp/sitespeed-output") (Docker output mount point)
  • _THRESHOLDS dict (lines 5360): (good, poor) for LCP, FCP, CLS, TBT, TTFB

Size: ~450 lines


src/perf/psi.py

Purpose: Calls Google PageSpeed Insights API, parses Lighthouse results

Key functions:

  • run_psi_test(url, device) — Call PageSpeed Insights API
    • GET https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=...&strategy={device}
    • Parses response.lighthouseResult
    • Calls _parse_lighthouse_audits() (shared with sitespeed)
    • Returns: success, performance_score (official), metrics, opportunities
  • _parse_lighthouse_audits(audits) — Extract metrics + opportunities from Lighthouse JSON
    • Maps audit keys (largest-contentful-paint, etc.) to metric values
    • Extracts opportunities (audit.details.type == "opportunity")
    • Calculates savings_ms and savings_bytes for each opportunity
    • Returns: metrics dict, opportunities list

Key constants:

  • PSI_ENDPOINT = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"
  • PSI_TIMEOUT = 90 (Google's API can be slow)

Size: ~150 lines


src/perf/batch.py

Purpose: Weekly portfolio performance sweep

Key functions:

  • run_weekly_perf_sweep(db) — Main sweep orchestrator
    • Loops: for each site in SITES:
    • Calls resolve_url_list() to get top 6 URLs
    • For each URL: calls run_full_test() (sitespeed + psi, mobile + desktop)
    • Logs completion summary
  • resolve_url_list(db, domain) — Get URLs for a site
    • Always: homepage
    • Plus: top 5 URLs from ranking_snapshots (last 30 days, sorted by impressions)
    • Returns: list of 6 URLs max
  • _get_top_urls(db, site_id, limit) — Query ranking_snapshots for impressions

Size: ~150 lines


src/models/perf.py

Purpose: SQLAlchemy ORM models for performance data

Models:

  • PerfRun — Test execution record
    • Fields: id, site_id, url, engine, device, started_at, completed_at, success, error_message
    • Relations: audits (1-to-many), opportunities (1-to-many), resources (1-to-many)
  • PerfAudit — Core Web Vitals metrics for one run
    • Fields: id, perf_run_id, performance_score, lcp_ms, cls, inp_ms, tbt_ms, fcp_ms, ttfb_ms, total_byte_weight, image_bytes, js_bytes, css_bytes, font_bytes, requests_count, dom_size
    • Relations: run (many-to-1)
  • PerfOpportunity — Lighthouse audit opportunity
    • Fields: id, perf_run_id, opportunity_key, display_label, savings_ms, savings_bytes, details_json
    • Relations: run (many-to-1)
  • PerfResource — HAR resource entry
    • Fields: id, perf_run_id, resource_url, resource_type, size_bytes, transfer_size_bytes, start_time_ms, end_time_ms, is_render_blocking
    • Relations: run (many-to-1)

Size: ~100 lines


Templates

templates/performance.html

Purpose: Portfolio performance scorecard

Features:

  • Table of all sites (13 rows)
  • Columns: domain, score_mobile, score_desktop, lcp_ms, cls, slowest_url, last_tested
  • Colour-coded scores (green ≥90, amber ≥50, red <50)
  • "Run portfolio sweep now" button (HTMX POST to /api/perf/sweep)
  • Sweep status display (idle | running | ok | error)

Size: ~200 lines


templates/performance_site.html

Purpose: Per-site performance detail dashboard

Features:

  • Latest CWV metrics (mobile + desktop side-by-side)
  • 12-week trend sparkline chart (mobile + desktop bars per week)
  • Top 5 optimisation opportunities (PSI)
  • Top 10 slowest resources (sitespeed HAR)
  • Per-URL breakdown table with test buttons
    • Columns: URL, score, LCP, CLS, requests, tested_at, test_now_buttons
    • Test buttons: Both (mobile+desktop), Mob, Dsk

Interactive elements:

  • HTMX buttons that queue tests
  • Coloured metric badges (green/amber/red)
  • Tooltips for long URLs

Size: ~390 lines


Supporting Files

src/config.py

What it contains:

  • Settings class (Pydantic)
  • SITES — list of 13 sites to monitor
    • Each site: domain, priority (sorting order)

Size: ~50 lines


src/db.py

What it contains:

  • SQLAlchemy engine + session factory
  • Base (declarative base for all models)
  • Database URI from .env
  • Migration logic (auto-create tables on startup)

Size: ~60 lines


requirements.txt

Key dependencies for performance testing:

  • fastapi, uvicorn (web framework)
  • sqlalchemy (ORM)
  • httpx (for PSI API calls)
  • docker (for sitespeed execution)
  • jinja2 (templates)

File Interaction Map

FastAPI Request
        ↓
performance.py (routers)
        ↓
[Query] perf_audits table via SQL
        ├─→ db.py (SQLAlchemy session)
        │
[Create] templates (Jinja2)
        ├─→ performance_site.html
        └─→ performance.html

[Background Task] api_perf_test()
        ↓
runner.py:run_full_test()
        ├─ For each engine:
        │  ├─ sitespeed.py:run_sitespeed_test() → Docker
        │  │  ├─ subprocess.run("docker run sitespeedio/...")
        │  │  ├─ _parse_har(browsertime.har)
        │  │  └─ _approx_score(metrics) → 0-100
        │  │
        │  └─ psi.py:run_psi_test() → Google API
        │     ├─ httpx.get(googleapis.com/...)
        │     ├─ _parse_lighthouse_audits(audits)
        │     └─ opportunities + official_score
        │
        ├─ runner.py:_persist_run() for each result
        │  ├─ INSERT perf_runs
        │  ├─ INSERT perf_audits
        │  ├─ INSERT perf_opportunities
        │  └─ INSERT perf_resources
        │
        └─ models/perf.py (ORM objects)
           └─ db.py (commit to SQLAlchemy)

Deployment

All files live in /home/help4bis/seo-intel/ on george (192.168.0.117).

To start the service:

cd /home/help4bis/seo-intel
./run.sh
# or
uvicorn src.main:app --host 0.0.0.0 --port 8765 --reload

To run tests manually:

cd /home/help4bis/seo-intel
python -c "
from src.perf.runner import run_full_test
from src.db import SessionLocal

db = SessionLocal()
result = run_full_test(
  site_id=3,
  url='https://rds.ink/endangered',
  db=db,
  engines=['sitespeed', 'psi'],
  devices=['mobile', 'desktop']
)
print(result)
"

See also: