feat(etl): Wave 2C — filesystem watcher (watchfiles, ~400ms sync)#75
Merged
Conversation
watchfiles-backed daemon that watches every registered adapter's source paths. On any change → adapter.read(since=watermark) → ingest → (lazy) normalize → (lazy) refresh_all_marts. Debounced 200ms to coalesce JSONL append bursts. End-to-end latency 155ms measured against the user's real ~/.claude/projects (target: ~400ms). Steps 4 (normalize) and 5 (mart refresh) are imported lazily and gracefully no-op until Wave 2A and 2B land — keeps this PR spec-conformant without coupling to in-flight code. Adds: - ``stackunderflow/etl/watcher.py`` (start_watcher + WatcherHandle) - ``BaseAdapter.watch_paths()`` Protocol method (default = []); claude / codex / cursor / cline implementations return canonical roots - ``stackunderflow start --no-watcher`` flag (STACKUNDERFLOW_DISABLE_WATCHER env override) - 11 new tests in tests/stackunderflow/etl/ Tests: 1341 → 1352 passing (+11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2a8aa1d to
31a52c6
Compare
0bserver07
added a commit
that referenced
this pull request
May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
stackunderflow/etl/watcher.py— Rust-backedwatchfilesdaemon thread that watches every default-on adapter's source paths. On any change → ingest the new bytes via the existingingest_filewriter (scoped to the changed file, not a full enumerate sweep) → lazily run the per-provider Wave 2A normalizer → lazily callrefresh_all_marts(conn)from Wave 2B. Debounced 200ms to coalesce JSONL append bursts.BaseAdapter.watch_paths()Protocol method (default[]). Concrete implementations on the four default-on adapters: claude (~/.claude/projects+~/.claude-{opus,sonnet,haiku,glm}variants if present), codex (~/.codex/sessions), cursor (the vscdb file directly —watchfilesfires on byte change), cline-family (theglobalStorage/<extension>/tasksroot, inherited by KiloCode + Roo Code through the shared base class).stackunderflow start --no-watcherfor headless / debugging runs (also honoursSTACKUNDERFLOW_DISABLE_WATCHER=1).etl.normalize) and Wave 2B (etl.watermark.refresh_all_marts) are imported lazily withImportErrorfallback — the watcher works as a pure incremental-ingest trigger today, and the normalize + marts steps activate automatically when those waves land. No coupling to in-flight code.Smoke test
Run against the user's real
~/.claude/projects(~150 projects, ~17K msgs):Spec target: ~400ms. Actual: 155ms, well under target.
Test plan
pytest tests/ -qclean: 1341 → 1352 passing (+11), 2 skipped.ruff check stackunderflow/etl/ tests/stackunderflow/etl/passes (no new warnings on modified files; pre-existing UP017/UP038 issues on cursor.py / cline.py left untouched per the "Wave 2C only" scope).tests/stackunderflow/etl/:test_adapter_watch_paths.py(4): every default-on adapter exposes the right roots.test_watcher.py(7):watch_paths_forfilters missing roots;_adapter_for_pathprefix-matching; idle-handle when no roots; append → refresh within 2s; burst-of-5 collapses into one cycle (debounce);WatcherHandle.stop()joins within timeout.Constraints honored
except.ImportErrorfallback).Platform notes
Verified on macOS (FSEvents-driven).
watchfilesclaims cross-platform but the Cline / Cursor paths are macOS-only in v1 anyway, so cross-platform parity falls out of the existing adapter constraints, not the watcher.