feat(etl): Wave 4D — 12 beta provider normalizers (full codeburn catalog coverage) by 0bserver07 · Pull Request #79 · 0bserver07/StackUnderflow

0bserver07 · 2026-05-06T21:02:39Z

Summary

12 beta provider normalizers ship Wave 4D: Codeium (stub), Continue, Copilot, Cursor Agent, Droid, Gemini, KiloCode, Kiro, OpenClaw, OpenCode, Pi/OMP, Qwen, Roo Code.
ETL pipeline now covers all 16 providers from the codeburn catalog. Each normalizer subclasses Normalizer from stackunderflow/etl/normalize/base.py, calls _build_event() for the canonical row shape, stamps cost_source per spec, and preserves provider-specific fields in raw_extras.
Beta providers stay opt-in via the existing STACKUNDERFLOW_BETA_* env flags — registering at import time is harmless when those adapters are off because no rows ever land with the matching provider value.

Implementation notes

Cached-subtraction rule applied to Gemini + Qwen exactly per catalog: input = promptTokenCount - cachedContentTokenCount, output = candidatesTokenCount + thoughtsTokenCount, cache_read = cachedContentTokenCount, cache_create = 0.
Reasoning fold-in applied to OpenCode: output = tokens.output + tokens.reasoning. Droid does the same for its thinkingTokens slot.
KiloCode + RooCode subclass ClineNormalizer directly — same on-disk format, only provider_name differs.
Codeium is a discovery-only stub that yields zero events; the registry entry exists so lookups never KeyError when the beta flag is on.
Pi covers OMP — same parser logic, different on-disk roots (~/.pi/agent/sessions/ vs. ~/.omp/agent/sessions/); we register PiNormalizer under both provider names.
Cursor Agent + Kiro always stamp cost_source='estimated' because their sources never carry per-message tokens.

Files touched

stackunderflow/etl/normalize/{codeium,continue_,copilot,cursor_agent,droid,gemini,kilocode,kiro,openclaw,opencode,pi,qwen,roocode}.py (13 new)
stackunderflow/etl/normalize/__init__.py (appended 14 registrations: pi → pi+omp)
tests/stackunderflow/etl/normalize/test_<provider>.py (13 new files, 77 new tests)
CHANGELOG.md

No routes, marts, watcher, or backfill code touched — strictly per scope.

Test plan

pytest tests/ -q — 1551 passed, 2 skipped (was 1474, +77 new tests)
ruff check stackunderflow/etl/normalize/ tests/stackunderflow/etl/normalize/ — clean
All 4 default-on normalizers (claude/codex/cursor/cline) still pass unchanged
Sample event rows verified for every provider — token math matches catalog spec, cost_usd computes correctly via the rate card

🤖 Generated with Claude Code

…log coverage) Adds Normalizer subclasses for the 12 beta providers Wave 2A left for later (Codeium, Continue, Copilot, Cursor Agent, Droid, Gemini, KiloCode, Kiro, OpenClaw, OpenCode, Pi/OMP, Qwen, Roo Code). The ETL pipeline now covers all 16 providers from the codeburn catalog. Token semantics match the catalog spec exactly — Gemini and Qwen apply the cached-subtraction rule (input = promptTokenCount - cachedContentTokenCount, output = candidatesTokenCount + thoughtsTokenCount). OpenCode folds reasoning into output. Cursor Agent and Kiro stamp cost_source='estimated' unconditionally because their sources don't carry per-message tokens. Codeium is a discovery-only stub that yields zero events. KiloCode + RooCode subclass Cline since they share the on-disk format (api_req_started.text JSON blob). Beta providers stay opt-in via the existing STACKUNDERFLOW_BETA_* adapter flags — registering normalizers here is harmless when the matching adapter is off because no rows ever land with that provider value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0bserver07 merged commit c678b2d into main May 6, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(etl): Wave 4D — 12 beta provider normalizers (full codeburn catalog coverage)#79

feat(etl): Wave 4D — 12 beta provider normalizers (full codeburn catalog coverage)#79
0bserver07 merged 1 commit into
mainfrom
feat/etl-beta-normalizers

0bserver07 commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0bserver07 commented May 6, 2026

Summary

Implementation notes

Files touched

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant