Skip to content

feat(etl): Wave 4A — compare/yield/optimize/messages-summary migrate to marts (3B redo)#81

Merged
0bserver07 merged 1 commit into
mainfrom
feat/etl-3b-redo
May 6, 2026
Merged

feat(etl): Wave 4A — compare/yield/optimize/messages-summary migrate to marts (3B redo)#81
0bserver07 merged 1 commit into
mainfrom
feat/etl-3b-redo

Conversation

@0bserver07
Copy link
Copy Markdown
Owner

Summary

Wave 4A finishes what Wave 3B started: migrates four analytical surfaces from the per-request aggregator pass over raw messages onto the ETL mart layer shipped in Wave 1 + 2 + 3A. Empty-mart fallback to the aggregator path is preserved on every migrated surface so users on un-materialised stores keep working.

  • services/compare.compare_models() reads model_day_mart for per-(model) totals (calls, tokens, cost) and session_mart for primary-model attribution (sessions, one-shot %, retry rate, $/session). When project_filter is set the call drops to the aggregator path because model_day_mart does not carry project_id.
  • services/yield_tracker._query_sessions() reads session_mart for the session list (cwd / started_at / cost_usd / primary_model). Cwd is still pulled from messages.raw_json since the v1 session_mart leaves cwd NULL per the builder docstring.
  • reports/optimize._detect_cache_overhead() reads pre-summed per-session input_tokens + cache_create from session_mart. The other six detectors stay on the aggregator path — docstrings now spell out why (signals like tool_use blocks, raw_json parsing, per-message byte counts aren't on any mart yet).
  • routes/data /api/messages/summary's top-level total (and a new bonus total_sessions field) come from project_mart. The detail blocks (by_type / by_model / total_tokens) still use the messages list because those dimensions aren't materialised.
  • routes/yield_route.get_yield gains the same Query sentinel sanitization routes/compare.py already had so direct test calls don't leak the FastAPI sentinel into the service layer.

store/mart_queries.py grows new helpers (kept consolidated, no inline SQL in routes): model_day_totals, session_mart_rows_for_compare, session_mart_rows_for_yield, session_mart_cache_overhead, project_mart_messages_summary_totals, plus mart_has_session_rows / mart_has_model_day_rows existence gates.

Test plan

  • pytest tests/ -q — 1489 passed, 2 skipped (was 1472 pre-PR; +17 new)
  • ruff check stackunderflow/services/compare.py stackunderflow/services/yield_tracker.py stackunderflow/reports/optimize.py stackunderflow/routes/data.py stackunderflow/routes/yield_route.py stackunderflow/store/mart_queries.py — all pass
  • ruff check on the 4 new test files — passes
  • Compare provider attribution: codex sessions surface as provider="codex" from session_mart, not the legacy JOIN projects.
  • Compare provider filter: provider_filter="claude" excludes codex entries even when model_day_mart has rows for both.
  • Yield cwd parity: session_mart.cwd=NULL does not break the cwd fetch — the service still pulls cwd from messages.raw_json.
  • Yield project_filter: slug-based filter pushes into the mart's JOIN projects clause.
  • Optimize cache_overhead: severity ladder + finding shape (incl. details.sessions schema) identical between mart and aggregator paths.
  • Messages-summary: total_messages=4242 in project_mart overrides the messages-list count of 1.
  • Empty-mart fallback: each migrated surface has a parity test that seeds the legacy tables only and confirms the aggregator path still answers correctly.
  • Speed: 50K mart rows answer compare / yield / cache-overhead within budget.

🤖 Generated with Claude Code

…to marts (3B redo)

Migrate four analytical surfaces to the ETL mart layer with empty-mart
fallback to the legacy aggregator path:

- services/compare.compare_models() reads model_day_mart for per-(model)
  totals (calls, tokens, cost) and session_mart for primary-model
  attribution (sessions, one-shot %, retry rate, $/session).
- services/yield_tracker._query_sessions() reads session_mart for the
  session list (cwd/started_at/cost_usd/primary_model). Cwd is still
  pulled from messages.raw_json since v1 session_mart leaves cwd NULL.
- reports/optimize._detect_cache_overhead() reads session_mart pre-summed
  per-session input_tokens + cache_create. Other detectors stay on the
  aggregator path — their signals (tool-call shape, raw_json parsing,
  per-message payload sizes) aren't materialised on any mart.
- routes/data /api/messages/summary's top-level total comes from
  project_mart.total_messages; by_type/by_model breakdowns still use the
  messages list (not in any mart).
- routes/yield_route.get_yield gains the same Query-sentinel coercion
  routes/compare.py already uses.

mart_queries.py grows new helpers: model_day_totals, session_mart_rows
_for_compare, session_mart_rows_for_yield, session_mart_cache_overhead,
project_mart_messages_summary_totals, plus mart_has_session_rows /
mart_has_model_day_rows existence gates.

Same JSON contract on all 4 routes. ?provider= / ?model= filter wiring
preserved. 17 new tests (4 files); existing 1472 still pass — total
1489.
@0bserver07 0bserver07 merged commit cbcce69 into main May 6, 2026
9 checks passed
@0bserver07 0bserver07 deleted the feat/etl-3b-redo branch May 6, 2026 21:10
0bserver07 added a commit that referenced this pull request May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from
the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82)
into a single [0.7.0] section.

New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming
agents. Architecture map, recent history, key gotchas, what's left,
files-to-read-first.

End-state on the maintainer's real store:
  150,337 usage_events
  Marts populated and watermarks in sync
  Dashboard cold-load 2.5s → <50ms warm
  Watcher 155ms end-to-end source-file-write → dashboard-data-fresh

1598 backend tests passing, 2 skipped, 11 deselected (slow suite).
Frontend typecheck + build clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant