feat(etl): Wave 3A — hot-path routes migrate to marts (projects, dashboard-data, cost-data)#76
Merged
Merged
Conversation
…aily_mart Routes prefer materialised mart rows when present, falling back to the existing aggregator path when project_mart is empty. Same JSON shape either way — the data source swaps, the contract holds. * projects.py: project_mart → ProjectStats UI shape; bulk SQL helpers stay as the fallback for projects not yet in the mart. * data.py: project_mart + daily_mart → dashboard statistics block. Tools/errors/hourly_pattern/sessions/user_interactions return shape-stable empties when marts drive the response — those blocks already load lazily via /api/cost-data, /api/commands, etc. * store/mart_queries.py: read helpers for project_mart, daily_mart, provider_day_mart with provider/model filter parity to the existing Annotated[list[str] | None, Query()] pattern. Cost route migration arrives in the next commit.
…ds provider_day_mart When project_mart has a row, /api/cost-data overlays the token_composition.daily/totals blocks with daily_mart-derived values. Per-session/per-command/per-tool detail blocks (session_costs, command_costs, tool_costs, outliers, retry_signals, session_efficiency, error_cost, trends) stay aggregator-driven — they need lower-grain marts that ship in Wave 4. /api/cost-data/by-provider switches to provider_day_mart when populated; the messages-table rollup stays as the empty-mart fallback. Same JSON contract either way.
…hetic rows 13 new tests covering: * projects: mart-driven stats path, fallback, multi-provider duplicate merge, < 100ms speed at 100K daily_mart rows * dashboard-data: overview from project_mart, daily_stats from daily_mart, models from daily_mart GROUP BY model, < 100ms speed * cost-data: token_composition.daily/totals overlay, no-overlay-when-empty fallback parity * cost-data/by-provider: mart fast-path, ?provider= filter narrowing, messages-table fallback when mart empty, < 100ms speed at 100K provider_day_mart rows
0bserver07
added a commit
that referenced
this pull request
May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/specs/etl-architecture.md): three highest-traffic dashboard routes swap their data source from per-request aggregator passes against rawmessagesto one indexedSELECTper request againstproject_mart/daily_mart/provider_day_mart.project_martrow is missing the route falls back to the existing aggregator/bulk-SQL path, so stores that haven't been backfilled yet keep working unchanged. Provider/model filter parity preserved (PR feat(ui): provider/model filter bar — URL-persisted filters across all dashboard tabs #66, fix: provider/model filters were silently ignored + startup blocked HTTP for the full reindex #70 wiring still works).Routes migrated
GET /api/projects?include_stats=true— readsproject_martfor the per-project totals; bulk SQL helpers (PR perf(projects): bulk SQL replaces N+1 stats loop on /api/projects (26s → 3s cold) #65) stay as the empty-mart fallback.GET /api/dashboard-data—project_martdrivesstatistics.overview,daily_martdrivesstatistics.daily_stats+statistics.models. Tools/errors/hourly_pattern/sessions/user_interactions return shape-stable empties when the mart drives the response — those blocks already load lazily via/api/cost-data,/api/commands,/api/tool-distribution.GET /api/cost-data—token_composition.daily/totalscome fromdaily_martwhen materialised. Per-session / per-command / per-tool detail blocks (session_costs, command_costs, tool_costs, outliers, retry_signals, session_efficiency, error_cost, trends) stay aggregator-driven — they need lower-grain marts deferred to Wave 4.GET /api/cost-data/by-provider— switches toprovider_day_martwhen populated; messages-table sweep stays as the empty-mart fallback.Routes deferred to Wave 3B
/api/compare,/api/yield,/api/optimize,/api/context-budget— owned by Wave 3B per the spec./api/messages,/api/messages/summary,/api/sessions,/api/jsonl-files— need raw rows the marts don't carry.What's new under the hood
stackunderflow/store/mart_queries.py— read helpers forproject_mart,daily_mart,provider_day_mart. Empty-mart safe (returns[]/None) so callers can gate onmart_has_project_row().mart_has_project_row(...)gate: when the project is materialised the route serves the response from marts; when it isn't, the existing aggregator path runs unchanged.Tests
tests/stackunderflow/routes/test_*_uses_mart.pycovering: mart-driven happy path, empty-mart fallback, multi-provider duplicate merge, provider/model filter pass-through, parity (empty store → mart populated), and < 100ms speed regression at 100K synthetic mart rows.queries.get_project_statswhich the empty-mart fallback path still calls, so the contract is preserved.Test plan
pytest tests/ -qclean (1472 passed, 2 skipped)ruff check stackunderflow/routes/ stackunderflow/store/mart_queries.pycleanConstraints honoured
docs/specs/etl-architecture.md) binding — mart schema and column names unchanged.?provider=,?model=) preserved on every migrated route via the sameAnnotated[list[str] | None, Query()] = Nonepattern PR feat(ui): provider/model filter bar — URL-persisted filters across all dashboard tabs #66/fix: provider/model filters were silently ignored + startup blocked HTTP for the full reindex #70 introduced; mart helpers addWHERE LOWER(provider) IN (?)/WHERE LOWER(model) IN (?).--no-verify/--no-gpg-sign.