Skip to content

test(etl): Wave 4E — real-data e2e + per-route latency regression#82

Merged
0bserver07 merged 1 commit into
mainfrom
feat/etl-real-data-tests
May 6, 2026
Merged

test(etl): Wave 4E — real-data e2e + per-route latency regression#82
0bserver07 merged 1 commit into
mainfrom
feat/etl-real-data-tests

Conversation

@0bserver07
Copy link
Copy Markdown
Owner

Summary

Adds two slow-marker test files under a new tests/stackunderflow/integration/ package and registers a slow pytest marker (opt-in via pytest -m slow; default pytest tests/ -q keeps the existing 1474-test collection untouched).

  • test_etl_pipeline_e2e.py — builds a 10K-message synthetic store across 5 providers (claude/codex/cursor/gemini/cline) over 30 days × 20 projects, runs every registered Normalizer end-to-end, refreshes every mart, asserts cost conservation across daily/session/project/provider_day/model_day, then sweeps every dashboard route via FastAPI's TestClient asserting 200 + non-empty + <500ms.
  • test_route_perf_regression.py — parametrises every dashboard route against a pre-populated marts fixture (100K daily / 50K session / 1K project / 2K provider_day / 5K model_day rows) plus a small 1K-message set so aggregator-driven routes stay tight. Each route runs 1 warmup + 5 cold + 5 warm requests; max(warm) must beat the per-route budget.
  • pyproject.toml — new [tool.pytest.ini_options] section registers the slow marker and adds addopts = "-m 'not slow'" so the default suite skips the slow tests automatically. Run the new suite with pytest -m slow.

/api/etl/status is listed for forward compatibility — the route isn't implemented yet on main, so the test accepts a 404 (e2e) / pytest.skip (regression) until the endpoint lands.

Latency table (dev box, M-series, Python 3.12)

route                                cold(p50)  warm(p50)  warm(max)  budget
/api/projects?include_stats=true        5.8ms      5.8ms      6.2ms    100ms
/api/dashboard-data                     8.6ms      7.1ms      7.7ms    100ms
/api/cost-data?period=month            12.1ms     11.9ms     16.0ms    100ms
/api/cost-data/by-provider              1.4ms      1.1ms      2.6ms     50ms
/api/compare?period=month               1.7ms      1.7ms      1.8ms    100ms
/api/yield?period=week                  1.3ms      1.2ms      1.2ms    200ms
/api/optimize?period=month             81.9ms    100.7ms    153.2ms    200ms
/api/messages/summary                   1.8ms      1.6ms      1.6ms     50ms
/api/etl/status                                              (404 — route not yet implemented)

Test counts

  • Default pytest tests/ -q1472 passed, 2 skipped, 11 deselected (slow tests). Total collection 1474 — matches baseline.
  • pytest -m slow tests/stackunderflow/integration -q10 passed, 1 skipped (the /api/etl/status placeholder).

Test plan

  • pytest -m slow tests/stackunderflow/integration -q — 10 passed + 1 skipped (etl_status placeholder).
  • pytest tests/ -q — 1472 passed + 2 skipped + 11 deselected (unchanged from baseline).
  • ruff check tests/stackunderflow/integration/ — all checks passed.
  • No source code touched (Wave 4E scope discipline) — only test files + pyproject.toml marker config + CHANGELOG.

🤖 Generated with Claude Code

Adds two slow-marker test files under a new
``tests/stackunderflow/integration/`` package:

* ``test_etl_pipeline_e2e.py`` — builds a 10K-message synthetic store
  across 5 providers (claude, codex, cursor, gemini, cline) over 30
  days × 20 projects, runs every registered Normalizer end-to-end,
  refreshes every mart, and asserts cost-conservation across all five
  marts. Then mounts the production routers behind a TestClient and
  hits every dashboard route asserting 200 + non-empty + <500ms.

* ``test_route_perf_regression.py`` — parametrises every dashboard
  route against a pre-populated synthetic marts fixture (100K daily,
  50K session, 1K project, 2K provider_day, 5K model_day rows) plus
  a small 1K-message set so aggregator-driven routes stay quick.
  Each route gets 1 warmup + 5 cold + 5 warm runs; max(warm) must
  beat the per-route budget. Prints a cold/warm/budget table to the
  log so future regressions can be calibrated from CI output alone.

Both files are gated on the new ``slow`` pytest marker registered in
``pyproject.toml``. Default ``pytest tests/ -q`` keeps its 1474-test
collection unchanged (11 slow tests deselected by ``addopts =
"-m 'not slow'"``); run the integration suite explicitly with
``pytest -m slow tests/stackunderflow/integration -q``.

``/api/etl/status`` is listed for forward compatibility — the route
isn't yet implemented in the current main, so the test accepts a 404
in lieu of a 200 (e2e) / pytest.skip (regression) until the route
lands. Latency table from a recent dev-box run:

  projects_with_stats               cold 5.8ms   warm 5.8ms   budget 100
  dashboard_data                    cold 8.6ms   warm 7.1ms   budget 100
  cost_data                         cold 12.1ms  warm 11.9ms  budget 100
  cost_data_by_provider             cold 1.4ms   warm 1.1ms   budget 50
  compare                           cold 1.7ms   warm 1.7ms   budget 100
  yield                             cold 1.3ms   warm 1.2ms   budget 200
  optimize                          cold 81.9ms  warm 100.7ms budget 200
  messages_summary                  cold 1.8ms   warm 1.6ms   budget 50

Synthetic stores live in ``tmp_path`` — the user's real
``~/.stackunderflow/store.db`` is never touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0bserver07 0bserver07 force-pushed the feat/etl-real-data-tests branch from 0b36218 to c34a329 Compare May 6, 2026 21:18
@0bserver07 0bserver07 merged commit 1e61554 into main May 6, 2026
9 checks passed
@0bserver07 0bserver07 deleted the feat/etl-real-data-tests branch May 6, 2026 21:20
0bserver07 added a commit that referenced this pull request May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from
the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82)
into a single [0.7.0] section.

New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming
agents. Architecture map, recent history, key gotchas, what's left,
files-to-read-first.

End-state on the maintainer's real store:
  150,337 usage_events
  Marts populated and watermarks in sync
  Dashboard cold-load 2.5s → <50ms warm
  Watcher 155ms end-to-end source-file-write → dashboard-data-fresh

1598 backend tests passing, 2 skipped, 11 deselected (slow suite).
Frontend typecheck + build clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant