Files
seo-intel-docs/code-refs/file-structure.md
help4bis 335d9a76e1 Initial SEO-INTEL documentation: architecture, scoring, code structure
Add comprehensive documentation for the dual-engine performance evaluation system:
- System architecture and data flow
- Score calculation methodology (0-100 approximation from CWV thresholds)
- Detailed metrics reference (LCP, FCP, CLS, TBT, TTFB)
- Testing engines comparison (Sitespeed vs PSI)
- Complete code structure map (file-by-file breakdown)
- Case study: rds.ink 77 score with actionable fixes
- Quick reference guides for interpreting results

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-05-14 05:56:49 +10:00

374 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Structure Map
Complete file-by-file breakdown of the seo-intel repository.
## Directory Layout
```
/home/help4bis/seo-intel/
├── README.md # Project overview (v1.1.0)
├── pyproject.toml # Python project config (dependencies, build)
├── requirements.txt # Python package list
├── run.sh # Launch script (runs main.py)
├── .env # Secrets: PSI_API_KEY, DB path, etc.
├── src/ # Python package
│ ├── __init__.py
│ ├── main.py # FastAPI app entry point
│ ├── config.py # Settings, site list (SITES config)
│ ├── db.py # SQLAlchemy setup, migrations, session factory
│ │
│ ├── models/
│ │ ├── __init__.py
│ │ ├── perf.py # ORM models: PerfRun, PerfAudit, PerfOpportunity, PerfResource
│ │ ├── site.py # Site model (name, domain, priority)
│ │ ├── ranking.py # Ranking snapshot model (SEO keyword rankings)
│ │ └── ... # Other models (not perf-related)
│ │
│ ├── routers/
│ │ ├── __init__.py
│ │ ├── performance.py # GET /performance/, /performance/<site_id>, POST /api/perf/test, /api/perf/sweep
│ │ ├── dashboard.py # GET / (main dashboard)
│ │ ├── keywords.py # Keyword ranking pages
│ │ └── ... # Other routers (not perf-related)
│ │
│ ├── perf/ # Performance testing engines
│ │ ├── __init__.py
│ │ ├── runner.py # Orchestrator: run_full_test() — runs engines × devices
│ │ ├── sitespeed.py # Sitespeed.io Docker wrapper + HAR parser
│ │ ├── psi.py # Google PageSpeed Insights API client
│ │ └── batch.py # Weekly sweep logic
│ │
│ ├── playbook/ # SEO playbook generation (not perf-related)
│ │ ├── __init__.py
│ │ ├── rules.py
│ │ └── llm.py
│ │
│ └── ... # Other modules (keyword analysis, etc.)
├── templates/ # Jinja2 HTML templates
│ ├── base.html # Base template (nav, styling)
│ ├── performance.html # Portfolio scorecard
│ ├── performance_site.html # Per-site detail dashboard
│ ├── dashboard.html # Main dashboard
│ └── ... # Other templates
├── data/
│ └── seo-intel.db # SQLite database (perf_runs, perf_audits, etc.)
├── docs/ # Documentation (this repo)
└── ops/ # Operations scripts
├── schema.sql # Database schema
└── ...
```
---
## Performance System Files (Perf Tier)
### src/routers/performance.py
**Purpose:** FastAPI routes for the performance dashboard
**Key functions:**
- `performance_home(request, db)``GET /performance/` → portfolio scorecard
- `performance_site(site_id, request, db)``GET /performance/<site_id>` → per-site detail
- `api_perf_test(body, background_tasks, db)``POST /api/perf/test` → trigger single URL test
- `api_perf_sweep(background_tasks)``POST /api/perf/sweep` → trigger portfolio sweep
- `_portfolio_rows(db)` — SQL: latest scores per site × device
- `_site_url_rows(db, site_id)` — SQL: latest score per URL
- `_site_latest_audit(db, site_id, device)` — SQL: full metrics for latest run
- `_site_trend(db, site_id, weeks)` — SQL: weekly AVG scores (12 weeks)
- `_site_opportunities(db, site_id, device)` — SQL: top PSI opportunities
- `_site_slow_resources(db, site_id)` — SQL: top 10 slowest resources
**Key imports:**
```python
from fastapi import APIRouter, BackgroundTasks, Depends
from sqlalchemy import text
from fastapi.templating import Jinja2Templates
from .perf.runner import run_full_test
from .perf.batch import run_weekly_perf_sweep
```
**Size:** ~545 lines
---
### src/perf/runner.py
**Purpose:** Orchestrates test runs across engines and devices
**Key functions:**
- `run_full_test(site_id, url, db, engines, devices)` — Main orchestrator
- Loops: for engine in engines: for device in devices:
- Calls appropriate engine (sitespeed or psi)
- Persists each result via `_persist_run()`
- Returns summary dict
- `_persist_run(db, site_id, url, engine, result)` — Writes one test result to database
- Inserts: perf_runs (1), perf_audits (1), perf_opportunities (0+), perf_resources (0+)
- Commits transaction
**Key imports:**
```python
from sqlalchemy.orm import Session
from .models.perf import PerfRun, PerfAudit, PerfOpportunity, PerfResource
from .sitespeed import run_sitespeed_test
from .psi import run_psi_test
```
**Size:** ~200 lines
---
### src/perf/sitespeed.py
**Purpose:** Wraps sitespeed.io Docker container, parses HAR output
**Key functions:**
- `run_sitespeed_test(url, device)` — Execute sitespeed in Docker
- Builds Docker command with device-specific args (--mobile vs desktop UA)
- Runs `docker run sitespeedio/sitespeed.io:40.4.0 {url} --n 3 ...`
- Waits for output (60s)
- Calls `_parse_har()` to extract metrics
- Calls `_approx_score()` to calculate performance score
- Returns: success, performance_score, metrics, resources
- `_parse_har(har_path)` — Parse `/tmp/sitespeed-output/{run_id}/.../browsertime.har`
- Extracts _googleWebVitals from pages[] (LCP, FCP, CLS, TTFB)
- Extracts _cpu.longTasks.totalBlockingTime from pages[] (TBT)
- Sums resource sizes by type (image, script, stylesheet, font)
- Returns: metrics dict, resources list
- `_approx_score(lcp_ms, fcp_ms, cls, tbt_ms, ttfb_ms)` — Calculate 0-100 score
- Uses _THRESHOLDS (lines 5360)
- Linear interpolation between good/poor for each metric
- Returns: int(mean(all_metric_scores))
- `_guess_resource_type(url, content_type)` — Classify resource (script, image, etc.)
**Key constants:**
- `SITESPEED_IMAGE = "sitespeedio/sitespeed.io:40.4.0"` (pinned version)
- `OUTPUT_BASE = Path("/tmp/sitespeed-output")` (Docker output mount point)
- `_THRESHOLDS` dict (lines 5360): (good, poor) for LCP, FCP, CLS, TBT, TTFB
**Size:** ~450 lines
---
### src/perf/psi.py
**Purpose:** Calls Google PageSpeed Insights API, parses Lighthouse results
**Key functions:**
- `run_psi_test(url, device)` — Call PageSpeed Insights API
- GET `https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=...&strategy={device}`
- Parses response.lighthouseResult
- Calls `_parse_lighthouse_audits()` (shared with sitespeed)
- Returns: success, performance_score (official), metrics, opportunities
- `_parse_lighthouse_audits(audits)` — Extract metrics + opportunities from Lighthouse JSON
- Maps audit keys (largest-contentful-paint, etc.) to metric values
- Extracts opportunities (audit.details.type == "opportunity")
- Calculates savings_ms and savings_bytes for each opportunity
- Returns: metrics dict, opportunities list
**Key constants:**
- `PSI_ENDPOINT = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"`
- `PSI_TIMEOUT = 90` (Google's API can be slow)
**Size:** ~150 lines
---
### src/perf/batch.py
**Purpose:** Weekly portfolio performance sweep
**Key functions:**
- `run_weekly_perf_sweep(db)` — Main sweep orchestrator
- Loops: for each site in SITES:
- Calls `resolve_url_list()` to get top 6 URLs
- For each URL: calls `run_full_test()` (sitespeed + psi, mobile + desktop)
- Logs completion summary
- `resolve_url_list(db, domain)` — Get URLs for a site
- Always: homepage
- Plus: top 5 URLs from ranking_snapshots (last 30 days, sorted by impressions)
- Returns: list of 6 URLs max
- `_get_top_urls(db, site_id, limit)` — Query ranking_snapshots for impressions
**Size:** ~150 lines
---
### src/models/perf.py
**Purpose:** SQLAlchemy ORM models for performance data
**Models:**
- `PerfRun` — Test execution record
- Fields: id, site_id, url, engine, device, started_at, completed_at, success, error_message
- Relations: audits (1-to-many), opportunities (1-to-many), resources (1-to-many)
- `PerfAudit` — Core Web Vitals metrics for one run
- Fields: id, perf_run_id, performance_score, lcp_ms, cls, inp_ms, tbt_ms, fcp_ms, ttfb_ms, total_byte_weight, image_bytes, js_bytes, css_bytes, font_bytes, requests_count, dom_size
- Relations: run (many-to-1)
- `PerfOpportunity` — Lighthouse audit opportunity
- Fields: id, perf_run_id, opportunity_key, display_label, savings_ms, savings_bytes, details_json
- Relations: run (many-to-1)
- `PerfResource` — HAR resource entry
- Fields: id, perf_run_id, resource_url, resource_type, size_bytes, transfer_size_bytes, start_time_ms, end_time_ms, is_render_blocking
- Relations: run (many-to-1)
**Size:** ~100 lines
---
## Templates
### templates/performance.html
**Purpose:** Portfolio performance scorecard
**Features:**
- Table of all sites (13 rows)
- Columns: domain, score_mobile, score_desktop, lcp_ms, cls, slowest_url, last_tested
- Colour-coded scores (green ≥90, amber ≥50, red <50)
- "Run portfolio sweep now" button (HTMX POST to /api/perf/sweep)
- Sweep status display (idle | running | ok | error)
**Size:** ~200 lines
---
### templates/performance_site.html
**Purpose:** Per-site performance detail dashboard
**Features:**
- Latest CWV metrics (mobile + desktop side-by-side)
- 12-week trend sparkline chart (mobile + desktop bars per week)
- Top 5 optimisation opportunities (PSI)
- Top 10 slowest resources (sitespeed HAR)
- Per-URL breakdown table with test buttons
- Columns: URL, score, LCP, CLS, requests, tested_at, test_now_buttons
- Test buttons: Both (mobile+desktop), Mob, Dsk
**Interactive elements:**
- HTMX buttons that queue tests
- Coloured metric badges (green/amber/red)
- Tooltips for long URLs
**Size:** ~390 lines
---
## Supporting Files
### src/config.py
**What it contains:**
- `Settings` class (Pydantic)
- `SITES` — list of 13 sites to monitor
- Each site: domain, priority (sorting order)
**Size:** ~50 lines
---
### src/db.py
**What it contains:**
- SQLAlchemy engine + session factory
- `Base` (declarative base for all models)
- Database URI from .env
- Migration logic (auto-create tables on startup)
**Size:** ~60 lines
---
### requirements.txt
Key dependencies for performance testing:
- fastapi, uvicorn (web framework)
- sqlalchemy (ORM)
- httpx (for PSI API calls)
- docker (for sitespeed execution)
- jinja2 (templates)
---
## File Interaction Map
```
FastAPI Request
performance.py (routers)
[Query] perf_audits table via SQL
├─→ db.py (SQLAlchemy session)
[Create] templates (Jinja2)
├─→ performance_site.html
└─→ performance.html
[Background Task] api_perf_test()
runner.py:run_full_test()
├─ For each engine:
│ ├─ sitespeed.py:run_sitespeed_test() → Docker
│ │ ├─ subprocess.run("docker run sitespeedio/...")
│ │ ├─ _parse_har(browsertime.har)
│ │ └─ _approx_score(metrics) → 0-100
│ │
│ └─ psi.py:run_psi_test() → Google API
│ ├─ httpx.get(googleapis.com/...)
│ ├─ _parse_lighthouse_audits(audits)
│ └─ opportunities + official_score
├─ runner.py:_persist_run() for each result
│ ├─ INSERT perf_runs
│ ├─ INSERT perf_audits
│ ├─ INSERT perf_opportunities
│ └─ INSERT perf_resources
└─ models/perf.py (ORM objects)
└─ db.py (commit to SQLAlchemy)
```
---
## Deployment
All files live in `/home/help4bis/seo-intel/` on george (192.168.0.117).
**To start the service:**
```bash
cd /home/help4bis/seo-intel
./run.sh
# or
uvicorn src.main:app --host 0.0.0.0 --port 8765 --reload
```
**To run tests manually:**
```bash
cd /home/help4bis/seo-intel
python -c "
from src.perf.runner import run_full_test
from src.db import SessionLocal
db = SessionLocal()
result = run_full_test(
site_id=3,
url='https://rds.ink/endangered',
db=db,
engines=['sitespeed', 'psi'],
devices=['mobile', 'desktop']
)
print(result)
"
```
---
See also:
- [Database Schema](database-schema.md) — All tables and fields
- [API Endpoints](api-endpoints.md) — HTTP routes and payloads