# Code Structure Map Complete file-by-file breakdown of the seo-intel repository. ## Directory Layout ``` /home/help4bis/seo-intel/ ├── README.md # Project overview (v1.1.0) ├── pyproject.toml # Python project config (dependencies, build) ├── requirements.txt # Python package list ├── run.sh # Launch script (runs main.py) ├── .env # Secrets: PSI_API_KEY, DB path, etc. │ ├── src/ # Python package │ ├── __init__.py │ ├── main.py # FastAPI app entry point │ ├── config.py # Settings, site list (SITES config) │ ├── db.py # SQLAlchemy setup, migrations, session factory │ │ │ ├── models/ │ │ ├── __init__.py │ │ ├── perf.py # ORM models: PerfRun, PerfAudit, PerfOpportunity, PerfResource │ │ ├── site.py # Site model (name, domain, priority) │ │ ├── ranking.py # Ranking snapshot model (SEO keyword rankings) │ │ └── ... # Other models (not perf-related) │ │ │ ├── routers/ │ │ ├── __init__.py │ │ ├── performance.py # GET /performance/, /performance/, POST /api/perf/test, /api/perf/sweep │ │ ├── dashboard.py # GET / (main dashboard) │ │ ├── keywords.py # Keyword ranking pages │ │ └── ... # Other routers (not perf-related) │ │ │ ├── perf/ # Performance testing engines │ │ ├── __init__.py │ │ ├── runner.py # Orchestrator: run_full_test() — runs engines × devices │ │ ├── sitespeed.py # Sitespeed.io Docker wrapper + HAR parser │ │ ├── psi.py # Google PageSpeed Insights API client │ │ └── batch.py # Weekly sweep logic │ │ │ ├── playbook/ # SEO playbook generation (not perf-related) │ │ ├── __init__.py │ │ ├── rules.py │ │ └── llm.py │ │ │ └── ... # Other modules (keyword analysis, etc.) │ ├── templates/ # Jinja2 HTML templates │ ├── base.html # Base template (nav, styling) │ ├── performance.html # Portfolio scorecard │ ├── performance_site.html # Per-site detail dashboard │ ├── dashboard.html # Main dashboard │ └── ... # Other templates │ ├── data/ │ └── seo-intel.db # SQLite database (perf_runs, perf_audits, etc.) │ ├── docs/ # Documentation (this repo) │ └── ops/ # Operations scripts ├── schema.sql # Database schema └── ... ``` --- ## Performance System Files (Perf Tier) ### src/routers/performance.py **Purpose:** FastAPI routes for the performance dashboard **Key functions:** - `performance_home(request, db)` — `GET /performance/` → portfolio scorecard - `performance_site(site_id, request, db)` — `GET /performance/` → per-site detail - `api_perf_test(body, background_tasks, db)` — `POST /api/perf/test` → trigger single URL test - `api_perf_sweep(background_tasks)` — `POST /api/perf/sweep` → trigger portfolio sweep - `_portfolio_rows(db)` — SQL: latest scores per site × device - `_site_url_rows(db, site_id)` — SQL: latest score per URL - `_site_latest_audit(db, site_id, device)` — SQL: full metrics for latest run - `_site_trend(db, site_id, weeks)` — SQL: weekly AVG scores (12 weeks) - `_site_opportunities(db, site_id, device)` — SQL: top PSI opportunities - `_site_slow_resources(db, site_id)` — SQL: top 10 slowest resources **Key imports:** ```python from fastapi import APIRouter, BackgroundTasks, Depends from sqlalchemy import text from fastapi.templating import Jinja2Templates from .perf.runner import run_full_test from .perf.batch import run_weekly_perf_sweep ``` **Size:** ~545 lines --- ### src/perf/runner.py **Purpose:** Orchestrates test runs across engines and devices **Key functions:** - `run_full_test(site_id, url, db, engines, devices)` — Main orchestrator - Loops: for engine in engines: for device in devices: - Calls appropriate engine (sitespeed or psi) - Persists each result via `_persist_run()` - Returns summary dict - `_persist_run(db, site_id, url, engine, result)` — Writes one test result to database - Inserts: perf_runs (1), perf_audits (1), perf_opportunities (0+), perf_resources (0+) - Commits transaction **Key imports:** ```python from sqlalchemy.orm import Session from .models.perf import PerfRun, PerfAudit, PerfOpportunity, PerfResource from .sitespeed import run_sitespeed_test from .psi import run_psi_test ``` **Size:** ~200 lines --- ### src/perf/sitespeed.py **Purpose:** Wraps sitespeed.io Docker container, parses HAR output **Key functions:** - `run_sitespeed_test(url, device)` — Execute sitespeed in Docker - Builds Docker command with device-specific args (--mobile vs desktop UA) - Runs `docker run sitespeedio/sitespeed.io:40.4.0 {url} --n 3 ...` - Waits for output (60s) - Calls `_parse_har()` to extract metrics - Calls `_approx_score()` to calculate performance score - Returns: success, performance_score, metrics, resources - `_parse_har(har_path)` — Parse `/tmp/sitespeed-output/{run_id}/.../browsertime.har` - Extracts _googleWebVitals from pages[] (LCP, FCP, CLS, TTFB) - Extracts _cpu.longTasks.totalBlockingTime from pages[] (TBT) - Sums resource sizes by type (image, script, stylesheet, font) - Returns: metrics dict, resources list - `_approx_score(lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms)` — Calculate 0-100 score - Uses _THRESHOLDS (lines 53–60) - Linear interpolation between good/poor for each metric - Returns: int(mean(all_metric_scores)) - `_guess_resource_type(url, content_type)` — Classify resource (script, image, etc.) **Key constants:** - `SITESPEED_IMAGE = "sitespeedio/sitespeed.io:40.4.0"` (pinned version) - `OUTPUT_BASE = Path("/tmp/sitespeed-output")` (Docker output mount point) - `_THRESHOLDS` dict (lines 53–60): (good, poor) for LCP, FCP, CLS, TBT, TTFB **Size:** ~450 lines --- ### src/perf/psi.py **Purpose:** Calls Google PageSpeed Insights API, parses Lighthouse results **Key functions:** - `run_psi_test(url, device)` — Call PageSpeed Insights API - GET `https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=...&strategy={device}` - Parses response.lighthouseResult - Calls `_parse_lighthouse_audits()` (shared with sitespeed) - Returns: success, performance_score (official), metrics, opportunities - `_parse_lighthouse_audits(audits)` — Extract metrics + opportunities from Lighthouse JSON - Maps audit keys (largest-contentful-paint, etc.) to metric values - Extracts opportunities (audit.details.type == "opportunity") - Calculates savings_ms and savings_bytes for each opportunity - Returns: metrics dict, opportunities list **Key constants:** - `PSI_ENDPOINT = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"` - `PSI_TIMEOUT = 90` (Google's API can be slow) **Size:** ~150 lines --- ### src/perf/batch.py **Purpose:** Weekly portfolio performance sweep **Key functions:** - `run_weekly_perf_sweep(db)` — Main sweep orchestrator - Loops: for each site in SITES: - Calls `resolve_url_list()` to get top 6 URLs - For each URL: calls `run_full_test()` (sitespeed + psi, mobile + desktop) - Logs completion summary - `resolve_url_list(db, domain)` — Get URLs for a site - Always: homepage - Plus: top 5 URLs from ranking_snapshots (last 30 days, sorted by impressions) - Returns: list of 6 URLs max - `_get_top_urls(db, site_id, limit)` — Query ranking_snapshots for impressions **Size:** ~150 lines --- ### src/models/perf.py **Purpose:** SQLAlchemy ORM models for performance data **Models:** - `PerfRun` — Test execution record - Fields: id, site_id, url, engine, device, started_at, completed_at, success, error_message - Relations: audits (1-to-many), opportunities (1-to-many), resources (1-to-many) - `PerfAudit` — Core Web Vitals metrics for one run - Fields: id, perf_run_id, performance_score, lcp_ms, cls, inp_ms, tbt_ms, fcp_ms, ttfb_ms, total_byte_weight, image_bytes, js_bytes, css_bytes, font_bytes, requests_count, dom_size - Relations: run (many-to-1) - `PerfOpportunity` — Lighthouse audit opportunity - Fields: id, perf_run_id, opportunity_key, display_label, savings_ms, savings_bytes, details_json - Relations: run (many-to-1) - `PerfResource` — HAR resource entry - Fields: id, perf_run_id, resource_url, resource_type, size_bytes, transfer_size_bytes, start_time_ms, end_time_ms, is_render_blocking - Relations: run (many-to-1) **Size:** ~100 lines --- ## Templates ### templates/performance.html **Purpose:** Portfolio performance scorecard **Features:** - Table of all sites (13 rows) - Columns: domain, score_mobile, score_desktop, lcp_ms, cls, slowest_url, last_tested - Colour-coded scores (green ≥90, amber ≥50, red <50) - "Run portfolio sweep now" button (HTMX POST to /api/perf/sweep) - Sweep status display (idle | running | ok | error) **Size:** ~200 lines --- ### templates/performance_site.html **Purpose:** Per-site performance detail dashboard **Features:** - Latest CWV metrics (mobile + desktop side-by-side) - 12-week trend sparkline chart (mobile + desktop bars per week) - Top 5 optimisation opportunities (PSI) - Top 10 slowest resources (sitespeed HAR) - Per-URL breakdown table with test buttons - Columns: URL, score, LCP, CLS, requests, tested_at, test_now_buttons - Test buttons: Both (mobile+desktop), Mob, Dsk **Interactive elements:** - HTMX buttons that queue tests - Coloured metric badges (green/amber/red) - Tooltips for long URLs **Size:** ~390 lines --- ## Supporting Files ### src/config.py **What it contains:** - `Settings` class (Pydantic) - `SITES` — list of 13 sites to monitor - Each site: domain, priority (sorting order) **Size:** ~50 lines --- ### src/db.py **What it contains:** - SQLAlchemy engine + session factory - `Base` (declarative base for all models) - Database URI from .env - Migration logic (auto-create tables on startup) **Size:** ~60 lines --- ### requirements.txt Key dependencies for performance testing: - fastapi, uvicorn (web framework) - sqlalchemy (ORM) - httpx (for PSI API calls) - docker (for sitespeed execution) - jinja2 (templates) --- ## File Interaction Map ``` FastAPI Request ↓ performance.py (routers) ↓ [Query] perf_audits table via SQL ├─→ db.py (SQLAlchemy session) │ [Create] templates (Jinja2) ├─→ performance_site.html └─→ performance.html [Background Task] api_perf_test() ↓ runner.py:run_full_test() ├─ For each engine: │ ├─ sitespeed.py:run_sitespeed_test() → Docker │ │ ├─ subprocess.run("docker run sitespeedio/...") │ │ ├─ _parse_har(browsertime.har) │ │ └─ _approx_score(metrics) → 0-100 │ │ │ └─ psi.py:run_psi_test() → Google API │ ├─ httpx.get(googleapis.com/...) │ ├─ _parse_lighthouse_audits(audits) │ └─ opportunities + official_score │ ├─ runner.py:_persist_run() for each result │ ├─ INSERT perf_runs │ ├─ INSERT perf_audits │ ├─ INSERT perf_opportunities │ └─ INSERT perf_resources │ └─ models/perf.py (ORM objects) └─ db.py (commit to SQLAlchemy) ``` --- ## Deployment All files live in `/home/help4bis/seo-intel/` on george (192.168.0.117). **To start the service:** ```bash cd /home/help4bis/seo-intel ./run.sh # or uvicorn src.main:app --host 0.0.0.0 --port 8765 --reload ``` **To run tests manually:** ```bash cd /home/help4bis/seo-intel python -c " from src.perf.runner import run_full_test from src.db import SessionLocal db = SessionLocal() result = run_full_test( site_id=3, url='https://rds.ink/endangered', db=db, engines=['sitespeed', 'psi'], devices=['mobile', 'desktop'] ) print(result) " ``` --- See also: - [Database Schema](database-schema.md) — All tables and fields - [API Endpoints](api-endpoints.md) — HTTP routes and payloads