Skip to content

feat(etl): Wave 4C — /api/etl/status route + CLI command#78

Merged
0bserver07 merged 1 commit into
mainfrom
feat/etl-status
May 6, 2026
Merged

feat(etl): Wave 4C — /api/etl/status route + CLI command#78
0bserver07 merged 1 commit into
mainfrom
feat/etl-status

Conversation

@0bserver07
Copy link
Copy Markdown
Owner

Summary

Wave 4C ships the ETL pipeline health surface — one endpoint + one CLI
command — so the dashboard can show a status badge and CLI users get a
one-line health check.

  • GET /api/etl/status — watcher state, mart watermarks vs the max
    usage_events.id, per-provider event counts, per-cost-source
    breakdown, and a health enum (live / syncing / stale / error).
    <50ms response on a 200K-event store; all counts are indexed
    COUNT(*).
  • stackunderflow etl status [--format text|json] — same payload,
    rendered as text by default. Works without a running server.
  • Shared assembler at stackunderflow/etl/status.py so both surfaces
    call the same SQL. The CLI degrades gracefully when no live watcher
    handle is available (reports running=\"unknown\").

Health rules (from the wave-4C spec)

  • error — a mart is more than 100 events behind and the watcher reports running=false.
  • stale — a mart is more than 100 events behind, watcher still alive (catch-up may still happen).
  • syncing — lag > 0 and a refresh ran in the last 10 seconds.
  • live — zero lag, or no events at all.

Test plan

  • 17 new tests pass (10 route, 7 CLI) covering response shape, every
    health transition, and watcher graceful-degrade.
  • pytest tests/ -q clean: 1489 passed, 2 skipped (was 1472 before).
  • HTTP smoke test against the user's real store: 200 OK, 5.9ms wall
    time end-to-end.
  • CLI smoke test: `stackunderflow etl status` correctly reports
    `stale` (watcher state unknown, marts 150K events behind) on the
    user's machine where ingest is current but the marts haven't been
    refreshed.

Files touched

  • new: `stackunderflow/routes/etl.py` (thin FastAPI shell)
  • new: `stackunderflow/etl/status.py` (shared assembler — keeps SQL out of the CLI)
  • new: `tests/stackunderflow/routes/test_etl_status.py`
  • new: `tests/stackunderflow/cli/test_etl_status.py`
  • modified: `stackunderflow/server.py` (register router — 2 lines)
  • modified: `stackunderflow/cli.py` (add `etl status` subcommand)
  • modified: `docs/cli-reference.md` (new `## ETL Commands` section)
  • modified: `docs/api-reference.md` (new `### GET /api/etl/status`)
  • modified: `CHANGELOG.md`

Single endpoint surfaces watcher health, mart watermarks vs max event
id, per-provider event counts, and a `health` enum (live / syncing /
stale / error) so the dashboard can show a status badge and the CLI
a one-line health check. <50ms response — all counts are indexed
COUNT(*) on usage_events / mart tables.

- New `stackunderflow/etl/status.py` shared assembler (single source of
  truth for both surfaces; route + CLI never disagree).
- New `stackunderflow/routes/etl.py` thin FastAPI shell.
- New `etl_status_cmd` in `stackunderflow/cli.py` with text + json
  formats; the text render mirrors the spec example.
- Watcher graceful degrade: when `deps.watcher_handle` is None (CLI,
  --no-watcher, or pre-server boot) status reports
  `running="unknown"` rather than crashing.
- 17 new tests (10 route, 7 CLI) cover shape, every health transition
  (live → syncing → stale → error), and watcher-degrade fallback.

Test count delta: 1472 → 1489 passed, 2 skipped (~+17).
@0bserver07 0bserver07 merged commit 61b4ce3 into main May 6, 2026
9 checks passed
@0bserver07 0bserver07 deleted the feat/etl-status branch May 6, 2026 21:15
0bserver07 added a commit that referenced this pull request May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from
the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82)
into a single [0.7.0] section.

New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming
agents. Architecture map, recent history, key gotchas, what's left,
files-to-read-first.

End-state on the maintainer's real store:
  150,337 usage_events
  Marts populated and watermarks in sync
  Dashboard cold-load 2.5s → <50ms warm
  Watcher 155ms end-to-end source-file-write → dashboard-data-fresh

1598 backend tests passing, 2 skipped, 11 deselected (slow suite).
Frontend typecheck + build clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant