feat(etl): Wave 4A — compare/yield/optimize/messages-summary migrate to marts (3B redo)#81
Merged
Merged
Conversation
…to marts (3B redo) Migrate four analytical surfaces to the ETL mart layer with empty-mart fallback to the legacy aggregator path: - services/compare.compare_models() reads model_day_mart for per-(model) totals (calls, tokens, cost) and session_mart for primary-model attribution (sessions, one-shot %, retry rate, $/session). - services/yield_tracker._query_sessions() reads session_mart for the session list (cwd/started_at/cost_usd/primary_model). Cwd is still pulled from messages.raw_json since v1 session_mart leaves cwd NULL. - reports/optimize._detect_cache_overhead() reads session_mart pre-summed per-session input_tokens + cache_create. Other detectors stay on the aggregator path — their signals (tool-call shape, raw_json parsing, per-message payload sizes) aren't materialised on any mart. - routes/data /api/messages/summary's top-level total comes from project_mart.total_messages; by_type/by_model breakdowns still use the messages list (not in any mart). - routes/yield_route.get_yield gains the same Query-sentinel coercion routes/compare.py already uses. mart_queries.py grows new helpers: model_day_totals, session_mart_rows _for_compare, session_mart_rows_for_yield, session_mart_cache_overhead, project_mart_messages_summary_totals, plus mart_has_session_rows / mart_has_model_day_rows existence gates. Same JSON contract on all 4 routes. ?provider= / ?model= filter wiring preserved. 17 new tests (4 files); existing 1472 still pass — total 1489.
a1dbd08 to
a921764
Compare
0bserver07
added a commit
that referenced
this pull request
May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wave 4A finishes what Wave 3B started: migrates four analytical surfaces from the per-request aggregator pass over raw
messagesonto the ETL mart layer shipped in Wave 1 + 2 + 3A. Empty-mart fallback to the aggregator path is preserved on every migrated surface so users on un-materialised stores keep working.services/compare.compare_models()readsmodel_day_martfor per-(model) totals (calls, tokens, cost) andsession_martfor primary-model attribution (sessions, one-shot %, retry rate,$/session). Whenproject_filteris set the call drops to the aggregator path becausemodel_day_martdoes not carryproject_id.services/yield_tracker._query_sessions()readssession_martfor the session list (cwd/started_at/cost_usd/primary_model). Cwd is still pulled frommessages.raw_jsonsince the v1session_martleavescwdNULL per the builder docstring.reports/optimize._detect_cache_overhead()reads pre-summed per-sessioninput_tokens+cache_createfromsession_mart. The other six detectors stay on the aggregator path — docstrings now spell out why (signals liketool_useblocks,raw_jsonparsing, per-message byte counts aren't on any mart yet).routes/data/api/messages/summary's top-leveltotal(and a new bonustotal_sessionsfield) come fromproject_mart. The detail blocks (by_type/by_model/total_tokens) still use the messages list because those dimensions aren't materialised.routes/yield_route.get_yieldgains the sameQuerysentinel sanitizationroutes/compare.pyalready had so direct test calls don't leak the FastAPI sentinel into the service layer.store/mart_queries.pygrows new helpers (kept consolidated, no inline SQL in routes):model_day_totals,session_mart_rows_for_compare,session_mart_rows_for_yield,session_mart_cache_overhead,project_mart_messages_summary_totals, plusmart_has_session_rows/mart_has_model_day_rowsexistence gates.Test plan
pytest tests/ -q— 1489 passed, 2 skipped (was 1472 pre-PR; +17 new)ruff check stackunderflow/services/compare.py stackunderflow/services/yield_tracker.py stackunderflow/reports/optimize.py stackunderflow/routes/data.py stackunderflow/routes/yield_route.py stackunderflow/store/mart_queries.py— all passruff checkon the 4 new test files — passesprovider="codex"fromsession_mart, not the legacyJOIN projects.provider_filter="claude"excludes codex entries even whenmodel_day_marthas rows for both.session_mart.cwd=NULLdoes not break the cwd fetch — the service still pulls cwd frommessages.raw_json.JOIN projectsclause.details.sessionsschema) identical between mart and aggregator paths.total_messages=4242inproject_martoverrides the messages-list count of 1.🤖 Generated with Claude Code