feat(etl): Wave 4D — 12 beta provider normalizers (full codeburn catalog coverage)#79
Merged
Merged
Conversation
…log coverage) Adds Normalizer subclasses for the 12 beta providers Wave 2A left for later (Codeium, Continue, Copilot, Cursor Agent, Droid, Gemini, KiloCode, Kiro, OpenClaw, OpenCode, Pi/OMP, Qwen, Roo Code). The ETL pipeline now covers all 16 providers from the codeburn catalog. Token semantics match the catalog spec exactly — Gemini and Qwen apply the cached-subtraction rule (input = promptTokenCount - cachedContentTokenCount, output = candidatesTokenCount + thoughtsTokenCount). OpenCode folds reasoning into output. Cursor Agent and Kiro stamp cost_source='estimated' unconditionally because their sources don't carry per-message tokens. Codeium is a discovery-only stub that yields zero events. KiloCode + RooCode subclass Cline since they share the on-disk format (api_req_started.text JSON blob). Beta providers stay opt-in via the existing STACKUNDERFLOW_BETA_* adapter flags — registering normalizers here is harmless when the matching adapter is off because no rows ever land with that provider value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07
added a commit
that referenced
this pull request
May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Normalizerfromstackunderflow/etl/normalize/base.py, calls_build_event()for the canonical row shape, stampscost_sourceper spec, and preserves provider-specific fields inraw_extras.STACKUNDERFLOW_BETA_*env flags — registering at import time is harmless when those adapters are off because no rows ever land with the matchingprovidervalue.Implementation notes
input = promptTokenCount - cachedContentTokenCount,output = candidatesTokenCount + thoughtsTokenCount,cache_read = cachedContentTokenCount,cache_create = 0.output = tokens.output + tokens.reasoning. Droid does the same for itsthinkingTokensslot.ClineNormalizerdirectly — same on-disk format, onlyprovider_namediffers.~/.pi/agent/sessions/vs.~/.omp/agent/sessions/); we registerPiNormalizerunder both provider names.cost_source='estimated'because their sources never carry per-message tokens.Files touched
stackunderflow/etl/normalize/{codeium,continue_,copilot,cursor_agent,droid,gemini,kilocode,kiro,openclaw,opencode,pi,qwen,roocode}.py(13 new)stackunderflow/etl/normalize/__init__.py(appended 14 registrations: pi → pi+omp)tests/stackunderflow/etl/normalize/test_<provider>.py(13 new files, 77 new tests)CHANGELOG.mdNo routes, marts, watcher, or backfill code touched — strictly per scope.
Test plan
pytest tests/ -q— 1551 passed, 2 skipped (was 1474, +77 new tests)ruff check stackunderflow/etl/normalize/ tests/stackunderflow/etl/normalize/— clean🤖 Generated with Claude Code