Skip to content

feat(etl): Wave 2C — filesystem watcher (watchfiles, ~400ms sync)#75

Merged
0bserver07 merged 1 commit into
mainfrom
feat/etl-watcher
May 5, 2026
Merged

feat(etl): Wave 2C — filesystem watcher (watchfiles, ~400ms sync)#75
0bserver07 merged 1 commit into
mainfrom
feat/etl-watcher

Conversation

@0bserver07
Copy link
Copy Markdown
Owner

Summary

  • New stackunderflow/etl/watcher.py — Rust-backed watchfiles daemon thread that watches every default-on adapter's source paths. On any change → ingest the new bytes via the existing ingest_file writer (scoped to the changed file, not a full enumerate sweep) → lazily run the per-provider Wave 2A normalizer → lazily call refresh_all_marts(conn) from Wave 2B. Debounced 200ms to coalesce JSONL append bursts.
  • New BaseAdapter.watch_paths() Protocol method (default []). Concrete implementations on the four default-on adapters: claude (~/.claude/projects + ~/.claude-{opus,sonnet,haiku,glm} variants if present), codex (~/.codex/sessions), cursor (the vscdb file directly — watchfiles fires on byte change), cline-family (the globalStorage/<extension>/tasks root, inherited by KiloCode + Roo Code through the shared base class).
  • New CLI flag stackunderflow start --no-watcher for headless / debugging runs (also honours STACKUNDERFLOW_DISABLE_WATCHER=1).
  • Wave 2A (etl.normalize) and Wave 2B (etl.watermark.refresh_all_marts) are imported lazily with ImportError fallback — the watcher works as a pure incremental-ingest trigger today, and the normalize + marts steps activate automatically when those waves land. No coupling to in-flight code.

Smoke test

Run against the user's real ~/.claude/projects (~150 projects, ~17K msgs):

00:29:20.237 stackunderflow.etl.watcher etl.watcher: refreshed marts in 72ms — 1 events
=== SMOKE TEST RESULT ===
  end-to-end latency (append → row in DB): 155ms

Spec target: ~400ms. Actual: 155ms, well under target.

Test plan

  • pytest tests/ -q clean: 1341 → 1352 passing (+11), 2 skipped.
  • ruff check stackunderflow/etl/ tests/stackunderflow/etl/ passes (no new warnings on modified files; pre-existing UP017/UP038 issues on cursor.py / cline.py left untouched per the "Wave 2C only" scope).
  • Smoke test against real ~/.claude/projects: append → DB row in 155ms.
  • 11 new tests in tests/stackunderflow/etl/:
    • test_adapter_watch_paths.py (4): every default-on adapter exposes the right roots.
    • test_watcher.py (7): watch_paths_for filters missing roots; _adapter_for_path prefix-matching; idle-handle when no roots; append → refresh within 2s; burst-of-5 collapses into one cycle (debounce); WatcherHandle.stop() joins within timeout.

Constraints honored

  • No version bump (still v0.6.1).
  • 1341 backend tests still pass.
  • Watcher never crashes on a bad event — every step wrapped in broad except.
  • Builds to the spec, not to in-flight Wave 2A/2B branches (lazy imports + ImportError fallback).

Platform notes

Verified on macOS (FSEvents-driven). watchfiles claims cross-platform but the Cline / Cursor paths are macOS-only in v1 anyway, so cross-platform parity falls out of the existing adapter constraints, not the watcher.

watchfiles-backed daemon that watches every registered adapter's
source paths. On any change → adapter.read(since=watermark) →
ingest → (lazy) normalize → (lazy) refresh_all_marts. Debounced 200ms
to coalesce JSONL append bursts.

End-to-end latency 155ms measured against the user's real
~/.claude/projects (target: ~400ms).

Steps 4 (normalize) and 5 (mart refresh) are imported lazily and
gracefully no-op until Wave 2A and 2B land — keeps this PR
spec-conformant without coupling to in-flight code.

Adds:
- ``stackunderflow/etl/watcher.py`` (start_watcher + WatcherHandle)
- ``BaseAdapter.watch_paths()`` Protocol method (default = []);
  claude / codex / cursor / cline implementations return canonical
  roots
- ``stackunderflow start --no-watcher`` flag
  (STACKUNDERFLOW_DISABLE_WATCHER env override)
- 11 new tests in tests/stackunderflow/etl/

Tests: 1341 → 1352 passing (+11).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0bserver07 0bserver07 merged commit b3c0572 into main May 5, 2026
7 of 9 checks passed
@0bserver07 0bserver07 deleted the feat/etl-watcher branch May 5, 2026 04:46
0bserver07 added a commit that referenced this pull request May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from
the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82)
into a single [0.7.0] section.

New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming
agents. Architecture map, recent history, key gotchas, what's left,
files-to-read-first.

End-state on the maintainer's real store:
  150,337 usage_events
  Marts populated and watermarks in sync
  Dashboard cold-load 2.5s → <50ms warm
  Watcher 155ms end-to-end source-file-write → dashboard-data-fresh

1598 backend tests passing, 2 skipped, 11 deselected (slow suite).
Frontend typecheck + build clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant