Skip to content

[DRAFT] LCM v4.1 — deferred features (themes / procedures / hard-delete drainer / voyage rate-state / intentions / describe consolidation)#616

Draft
100yenadmin wants to merge 57 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/lcm-v4.1-deferred-features
Draft

[DRAFT] LCM v4.1 — deferred features (themes / procedures / hard-delete drainer / voyage rate-state / intentions / describe consolidation)#616
100yenadmin wants to merge 57 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/lcm-v4.1-deferred-features

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

Status: DRAFT — DO NOT MERGE

This is a snapshot branch preserving features that were CUT from PR #613 (the v4.1 omnibus) per Eva's first-principles pass on 2026-05-06. These features are well-architected but ship as half-shipped UX in their current state. This branch keeps the work intact so we can pick any of them up later with full context.

Branch tip: f932086 (the state of feat/lcm-v4.1-omnibus before the cuts were applied).

Companion main PR: #613 — same commit history up to f932086, then diverges with cuts applied.

What's preserved here (and why each was cut from #613)

1. Themes feature

Files: src/themes/consolidation.ts, src/tools/lcm-{recent,search}-themes-tool.ts, src/tools/lcm-theme-explain-tool.ts, test/themes-consolidation.test.ts, test/lcm-{recent,search}-themes-tool.test.ts, test/lcm-theme-explain-tool.test.ts, lcm_themes + lcm_theme_sources schema, lcm_themes_stale_on_suppress trigger.

Why cut from #613: 3 agent tools wired + schema shipped + cascade trigger shipped, but the themes-consolidation worker has no auto-tick wired into the plugin lifecycle. Operators get "No themes found" forever with no way to manually trigger consolidation (the /lcm worker tick themes-consolidation parser explicitly rejects that kind name). Half-shipped UX worse than not shipping. Per challenger agent C3 at 96% confidence.

To complete (estimated 6-10 hours):

  • Worker autostart src/operator/themes-consolidation-autostart.ts (mirror entity-coref pattern, ~150 LOC)
  • Theme-namer LLM call src/themes/theme-namer-llm.ts (mirror entity-extractor-llm pattern, ~80 LOC)
  • Plugin wiring + gateway_stop hook (~30 LOC)
  • Candidate-leaf fetcher SQL adapter (~50 LOC, NEW — needs to query summaries WHERE embedding exists, fetch Float32Array vectors from vec0)
  • Cluster-naming prompt template (research-grade — hardest part)
  • Tests + manual verification with real corpus

Test cases unlocked: D3 ("What themes have we worked on this month?")

2. Procedure mining

Files: src/extraction/procedure-mining.ts, src/extraction/procedure-prefilter.ts, test/procedure-mining.test.ts, test/procedure-prefilter.test.ts, lcm_procedures schema. NOTE: no agent tool exists yet (lcm_get_procedure was proposed but never built).

Why cut from #613: 0% shipped — no agent tool, no LLM injection, no auto-tick. Pure dead code in production. Per challenger agent C5.

To complete (estimated 10-16 hours):

  • Procedure judge LLM call (~120 LOC, harder than entity prompt — judge must confirm cluster IS a procedure AND extract steps)
  • Worker autostart src/operator/procedure-mining-autostart.ts (~250 LOC mirror of entity-coref autostart)
  • New agent tools: lcm_get_procedure + lcm_search_procedures (~300 LOC + tests, mirror of entity tools)
  • Plugin wiring + manifest entries
  • Note: mineProceduresPass does FULL-CORPUS clustering each pass — at large session sizes this is O(N²); incremental mining is documented as not implemented
  • LLM cost: ~$0.10/cluster, daily for N sessions

Test cases unlocked: D1, D5 ("How do I rebuild the gateway?" / "Standard procedure for X")

3. runPurge --immediate hard-delete drainer

Files: mode='immediate' branch in src/operator/purge.ts:281-376, lcm_purge_rebuild_queue schema.

Why cut from #613: No drainer worker exists — --immediate is functionally identical to --soft PLUS enqueueing to a queue nobody drains. To honor "no Phase 2" mandate.

To complete (estimated 20-40 hours, HIGH RISK):

  • New drainer worker src/operator/purge-rebuild-worker.ts (~400 LOC)
  • Pattern: rebuild a condensed summary WITHOUT suppressed leaves' content using the v4.1.1 A4 forwarder pattern
    • INSERT new condensed row → UPDATE OLD.superseded_by → NEVER mutate summary_parents
    • THEN DELETE the leaves
  • Handle parent-chain re-rebuilds (cascading: condensed-of-condensed)
  • Risk: HIGH — wrong rebuild can corrupt the assemble pyramid; needs rigorous test coverage (rebuild idempotence, partial-failure recovery, cascading rebuilds)
  • This is the only feature here that touches the core assemble-pyramid invariants

GDPR compliance note: When complete, this provides true byte-level deletion. Until then, soft-purge makes content invisible to all read paths but bytes remain on disk.

4. lcm_voyage_rate_state schema (cross-process Voyage budget coordination)

Files: lcm_voyage_rate_state table + 2 seeded rows in src/db/migration.ts:1374-1389, tests at test/v41-support-tables.test.ts:109-165.

Why cut from #613: Table-only feature, ZERO readers/writers. Per-process throttle in voyage/client.ts covers Eva's single-gateway use today.

To complete (estimated 6-10 hours):

  • New module src/voyage/rate-state.ts (~150 LOC)
  • Pattern: BEGIN IMMEDIATE → SELECT row → UPDATE counters → COMMIT (per architecture spec at migration.ts:1371-1373)
  • Rolling-window reset logic
  • 429 backoff coordination
  • Integration into voyage/client.ts (~50 LOC)
  • Risks: WAL-mode write contention; clock skew across processes; testing with N concurrent gateways
  • Use case justification: only matters in multi-gateway scenarios that don't exist for current operator

5. Intentions feature (was already only schema-deep)

Files: lcm_intentions table + 2 indexes (src/db/migration.ts:1820-1845), prospective-extract prompt template in seed-default-prompts.ts:322-351, schema-smoke tests.

Why cut from #613: ZERO producer, ZERO consumer, ZERO agent tools. PR description had doc-drift (pyramid diagram showed "due intentions" as a real layer; engine.ts has NO code that reads from lcm_intentions). Per challenger agent C3 at 99% confidence.

To complete (estimated 16-30 hours):

  • Prospective-extract worker (~200 LOC) — needs LLM-call wiring + integration with leaf-extraction pass + dedupe + idle scheduler
  • 4 agent tools: lcm_get_intention, lcm_search_intentions, lcm_resolve_intention, lcm_due_intentions (~600 LOC + 600 LOC tests = ~1200 LOC)
  • Pyramid integration ("due intentions" layer in assemble()) — careful FK/dedup logic + tests
  • Risk: agents may extract a flood of false-positive "intentions" from chitchat ("I'll think about that later"), polluting the surface

Original justification (v3 Agent C): "prospective memory: 'remember to call them Tuesday'" — speculative, never re-validated empirically against operator workflows.

6. lcm_describe consolidation (refactor — not a new feature)

Files: would refactor src/tools/lcm-describe-tool.ts to absorb lcm_get_entity, lcm_theme_explain, lcm_get_procedure (when built) via ID-prefix dispatch.

Why deferred from #613: Agent C1 verdict: 95% NO-GO in this PR. 400-LOC refactor touching the canonical lcm_describe (used by every recall escalation flow). After 3 final-review passes, reopening adversarial review surface on the canonical path = real regression risk. Asymmetric: defer cost = mild ergonomic; ship-now cost = canonical-tool blast radius if buggy.

To complete (estimated 6 hours):

  • Extend RetrievalEngine.describe to dispatch on entity_<id>, theme_<id>, procedure_<id> prefixes
  • Move per-type DB queries from per-tool files INTO RetrievalEngine
  • Update DescribeResult discriminated union
  • 15+ test updates across 6 files
  • Fresh adversarial review pass on the consolidated describe path

Capability impact: Zero — ergonomic only. Agent surface goes from 8 tools to ~5 tools.

How to pick this up later

  1. Branch from this snapshot: git checkout -b feat/lcm-v4.1-themes feat/lcm-v4.1-deferred-features
  2. Keep ONLY the files for the feature you're shipping (delete the others)
  3. Add the missing wiring listed above
  4. Open as its own focused PR
  5. Reference this branch + PR feat(lcm): v4.1 — agent memory that actually works (replaces #516; companion #616 deferred) #613 for context

What's NOT here

The capability extensions that DID ship in PR #613:

  • lcm_grep mode='verbatim' (closes Type C verbatim retrieval)
  • lcm_grep mode='semantic' (capability addition)
  • lcm_describe expandChildren/expandMessages flags (closes Type E drilldown friction without lifting lcm_expand gate)
  • lcm_get_entity + lcm_search_entities (entity catalog tools — entity coref worker IS auto-ticking)

PR #613 ships the 8-tool surface that covers all 5 question types (with 22/25 PRIMARY test case coverage; 3/25 D-pattern theme/procedure sub-cases on adequate-fallback coverage). This branch preserves the additional features for when we have bandwidth to ship them complete.


Ref: First-principles pass + 8 challenger agents documented in ~/.claude/plans/glistening-swimming-rivest.md (Eva's plan file). 2026-05-06.

Eva added 30 commits May 6, 2026 00:47
First commit of the v4.1 omnibus implementation. Smallest possible slice:
introduces the cross-process concurrency model module and the
`lcm_worker_lock` table that enables a sidecar worker process for cold
maintenance work (condensation, extraction, embedding backfill, theme
consolidation, eval, profile rebuild).

Resolves v4.1.1 amendment A9 (`last_heartbeat_at` column required by
§0.5 fallback rule: gateway can take over only when BOTH
`expires_at < now` AND `last_heartbeat_at < now - 300s`).

Changes:
- src/concurrency/model.ts (NEW) — single source of truth for §0
  invariants, busy_timeout constants, worker job-kind catalogue, and
  defensive assertion helpers (assertForeignKeysEnabled,
  assertBusyTimeoutForRole). Documents the no-LLM-in-write-tx invariant
  and the worker_threads heartbeat requirement (v4.1.1 A9).
- src/db/migration.ts (+25 lines) — new `ensureLcmWorkerLockTable`
  migration step. Idempotent CREATE TABLE IF NOT EXISTS, runs after FTS
  setup, before the BEGIN EXCLUSIVE COMMIT.
- test/concurrency-model.test.ts (NEW, 10 tests) — verifies invariant
  ordering (worker timeout < gateway, TTL ≥ 3× heartbeat, fallback soak
  > TTL), job-kind catalogue, and assertion helpers.
- test/lcm-worker-lock.test.ts (NEW, 4 tests) — verifies migration
  creates the table with the right columns (including A9's
  last_heartbeat_at), is idempotent, supports basic acquire/heartbeat,
  and supports stale-lock GC.

Verification:
- npm run build: passes
- npm test --run: 48 files / 872 tests passing (up from 858 baseline,
  +14 new tests, zero regressions)
- Live DB ground-truth check: ran the new DDL against a copy of
  /Users/lume/.openclaw/lcm.db (2.5GB, 762 conversations, 3771 leaf
  summaries). Migration succeeds; existing data untouched; acquire
  pattern works; PK conflict throws as expected.

Notes:
- Code-as-ground-truth pivot: per the v4.1.1 plan, each commit cites
  the amendment(s) it resolves and is verified against live data.
- v4.1.1 A6 finding (PRAGMA foreign_keys = OFF on Eva's CLI test)
  partially superseded: src/db/connection.ts:configureConnection()
  already sets it ON for every connection that goes through the
  standard path. The new assertForeignKeysEnabled() is a defensive
  guardrail for future code paths that bypass configureConnection.
…_feature_flags (A.02)

Resolves v4.1.1 amendments A2 (suppress_reason + superseded_by columns)
and A8 (feature-flag storage). Adds the v3.1 columns the v4.1 spec
depends on (session_key, suppressed_at, entity_index,
contains_suppressed_leaves) since v3.1 never shipped to upstream.

Changes:

- src/db/migration.ts (+104 LOC):
  - ensureSummaryV41Columns(db) — adds 7 columns to summaries via the
    existing PRAGMA table_info / ADD COLUMN pattern (matches
    ensureSummaryDepthColumn / ensureSummaryMetadataColumns / etc.):
      session_key                    TEXT NOT NULL DEFAULT ''   (v3.1 A1)
      suppressed_at                  TEXT                         (v3.1 A3)
      entity_index                   TEXT                         (v3.1 §7.2)
      contains_suppressed_leaves     INTEGER NOT NULL DEFAULT 0  (v3.1 A3)
      suppress_reason                TEXT                         (v4.1.1 A2)
      superseded_by                  TEXT REFERENCES summaries    (v4.1.1 A2/A4)
                                       ON DELETE SET NULL
      leaf_summarizer_cap_was        INTEGER                      (v4.1)
  - ensureMessageSuppressedAtColumn(db) — adds messages.suppressed_at
    (v3.1 A3 cascade target for lcm_quote / lcm_factcheck filtering)
  - ensureLcmFeatureFlagsTable(db) — clean new table
    `lcm_feature_flags(flag PK, value NOT NULL, updated_at NOT NULL)`
  - lcm_worker_lock TEXT PK explicitly NOT NULL (SQLite legacy quirk
    allows NULL in TEXT PK columns without it).

- test/v41-summaries-columns.test.ts (NEW, 12 tests):
  - Per-column verifications (NOT NULL, default value, FK target/action)
  - lcm_feature_flags schema + basic set/read pattern
  - Legacy `lcm_migration_flags` coexistence verified

Verification:
- npm run build: passes
- npm test --run: 49 files / 884 tests passing (+12 from A.01's 872, 0 regressions)
- Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db:
    summaries 14 → 21 columns; 7 v4.1 cols added.
    messages gains suppressed_at; 3774 leaves preserved.
    lcm_worker_lock + lcm_feature_flags created.
    Eva's legacy lcm_rollups* + lcm_migration_flags untouched.
    4187 summaries now have session_key='' (A.08 backfill target).

Code-as-ground-truth findings (revising v4.1.1 spec):

1. v4.1.1 A8 originally said "extend lcm_migration_flags with value column."
   That table doesn't exist in upstream src/ — it only exists on Eva's
   live DB from old fork-side code. Replaced with a clean new
   `lcm_feature_flags` table. Eva's legacy table stays alongside, untouched.

2. v4.1.1 A6 (PRAGMA foreign_keys = OFF) is partly misleading: the
   codebase's src/db/connection.ts:configureConnection() already sets
   foreign_keys = ON for every connection through the standard path.
   Eva's earlier sqlite3 CLI test was using a different connection, not
   the production path. The new src/concurrency/model.ts already provides
   assertForeignKeysEnabled() as a defensive guardrail.

3. SQLite TEXT PRIMARY KEY columns do NOT auto-enforce NOT NULL (legacy
   behavior). Both new tables (lcm_worker_lock, lcm_feature_flags) now
   have explicit NOT NULL on their PK column. Caught by tests.

4. SQLite ADD COLUMN with REFERENCES requires NULL default — verified
   `superseded_by TEXT REFERENCES summaries(summary_id) ON DELETE SET NULL`
   works as ALTER TABLE ADD COLUMN (no NOT NULL allowed). Documented in
   ensureSummaryV41Columns docstring.
… + audit (A.03)

Adds the four "support tables" the worker process and operator surface
need before the heavy schema (synthesis cache, embeddings, entities,
themes) lands. Each is a clean idempotent CREATE TABLE IF NOT EXISTS.

Resolves v4.1.1:
- A3 — `lcm_extraction_queue`: gateway atomically inserts a queue row
  with every leaf write; worker drains it for entity coreference and
  procedure-recheck. CHECK constraint on `kind` ('entity' |
  'procedure-recheck'). Indexes on pending (queued_at WHERE picked_at
  IS NULL) and dead-letter (attempts >= 5).
- B2 (partial) — `lcm_purge_rebuild_queue`: persistent rebuild queue
  for `lcm_purge --immediate`. T1 fires suppression cascade + enqueues;
  worker drains using A4 forwarder pattern. Indexes on pending +
  purge_session_id.
- B3 (partial) — `lcm_voyage_rate_state`: cross-process rate-limit
  budget for Voyage embed + rerank. SQLite serializes BEGIN IMMEDIATE
  naturally so gateway + worker coordinate via this shared row. CHECK
  constraint on bucket ('embed' | 'rerank'). Seeded with both rows
  idempotently (`INSERT OR IGNORE`). Spec note: HTTP call MUST happen
  AFTER the COMMIT — wrapping HTTP in BEGIN IMMEDIATE would serialize
  every gateway query embed and add 200-2000ms latency.
- §C item — `lcm_session_key_audit`: reversibility log for §2.1 step 1
  re-key of 5 legacy convs. Allows operator `/lcm
  undo-session-key-rekey <conv_id>` if the spike's identification was
  wrong for any of those convs.

Changes:
- src/db/migration.ts (+90 LOC): four `runMigrationStep` blocks added
  inline after the v3.1+v4.1 column work from A.02
- test/v41-support-tables.test.ts (NEW, 9 tests): per-table schema
  verification (columns, FKs, indexes, CHECK constraints), CHECK
  rejection paths, idempotent re-run verification, brief-tx update
  pattern verification for rate state

Verification:
- npm test --run: 50 files / 893 tests passing (+9 from A.02's 884,
  zero regressions)
- Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db:
    PRE lcm_ tables: 5 (legacy lcm_migration_flags + lcm_migration_state
                       + 3 lcm_rollups* from Eva's fork)
    POST lcm_ tables: 9 (5 legacy preserved + 4 new)
    voyage rate state seeded with embed + rerank rows
    3774 leaves preserved, 762 conversations preserved
    Eva's lcm_rollups* untouched (out-of-scope for v4.1; v4.1 replaces
    its functionality via lcm_synthesis_cache landing in A.04)

Notes:
- All four FKs use the production summaries / conversations tables;
  CASCADE on DELETE is the right semantics (queue/audit rows are
  derived; if their parent is genuinely deleted, they should follow).
- Per v4.1.1 A6 (now confirmed code-side): connection.ts already
  enforces foreign_keys = ON, so these CASCADEs work in production.
… cache_leaf_refs + synthesis_audit (A.04)

Adds the four-table synthesis layer per v4.1 §3 + §1.3 + v4.1.1 B1/B4.
Tables created in dependency order so FKs work on first run:
prompt_registry → synthesis_cache (FK on prompt_id) → cache_leaf_refs
(FK on cache_id) → synthesis_audit (FK on prompt_id + either summary_id
or cache_id).

Resolves v4.1.1:
- B1 — `lcm_synthesis_audit` schema: pass_output is NULLable (insert
  with NULL before LLM call, UPDATE on return). Adds `status` column
  ('started' | 'completed' | 'failed') for orphan-row tracking. Started-
  GC index supports the 1-hour orphan cleanup query.
- B4 — UNIQUE lookup index on `lcm_synthesis_cache` enables cross-
  process single-flight via INSERT OR IGNORE pattern (loser of race
  reads back in-flight row, polls for status='ready').
- §3 + §1.3 — prompt registry with versioning per (memory_type,
  tier_label, pass_kind, version) tuple. Append-only; bundle_version
  groups prompt sets for synchronized voice-consistency rebuild.
- §3 — synthesis cache with status='building' single-flight, prompt_id
  FK enables prompt-selective invalidation (NEVER touches durable
  summaries.content rows — closes v3 design principle 4 violation that
  v4 had introduced).
- v3.1 A3 extension — cache_leaf_refs inverse index for proactive purge
  on lcm_suppress (cascades both directions: ref deleted when either
  cache_id OR leaf_summary_id parent is deleted).

Changes:
- src/db/migration.ts (+150 LOC): four runMigrationStep blocks, all
  idempotent, all in dependency order.
- test/v41-synthesis-tables.test.ts (NEW, 14 tests):
  - prompt_registry: CHECK constraint enforcement (memory_type, pass_kind),
    UNIQUE constraint on (memory_type, tier_label, pass_kind, version)
  - synthesis_cache: status + tier_label CHECK enforcement,
    INSERT OR IGNORE single-flight pattern (ON CONFLICT DO NOTHING)
  - cache_leaf_refs: bidirectional CASCADE behavior verified
  - synthesis_audit: pass_output NULLable, started→completed pattern,
    CHECK requiring at least one target column, started-GC index exists

Verification:
- npm test --run: 51 files / 907 tests passing (+14 from A.03's 893,
  zero regressions)
- Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db:
    PRE: 5 lcm_ tables (legacy)
    POST A.01-A.04 cumulative: 15 lcm_ tables
      = 5 legacy preserved + 10 new
        (worker_lock, feature_flags, extraction_queue, purge_rebuild_queue,
         voyage_rate_state, session_key_audit, prompt_registry,
         synthesis_cache, cache_leaf_refs, synthesis_audit)
    3774 leaves preserved, 762 conversations preserved.
    PRAGMA foreign_keys=1.

Notes:
- DB copies for end-to-end verification moved to /Volumes/LEXAR/lcm-tmp
  (the live DB is 2.5GB; /tmp filled up after a few iterations).
- B4 UNIQUE index uses COALESCE(grep_filter, '') so SQLite can index the
  expression deterministically (NULL-grep_filter rows would otherwise
  not be uniquely-indexed since NULL ≠ NULL in SQL semantics).
… (A.05)

Per v4.1 §11 + v4.1.1 (revising v4 design):
- N≥100 stratified queries (50% fts-easy, 25% fts-medium, 25% paraphrastic).
- 2× empirical SD threshold (calibrate by 5x repeated baseline runs).
- Ensemble judge (3 different model families).
- Mixed absolute+pairwise scoring per dimension.
- Drift index for cumulative regression.
- Measures BOTH retrieval_recall AND synthesis_quality (separate metrics
  per v4.1.1 — closes the v4 gap where eval collapsed them).

Tables (dependency order):
- lcm_eval_query_set: query set registry (e.g. 'eva-baseline-v2')
- lcm_eval_query: per-query rows with stratum CHECK constraint, optional
  reference_summary for gold-standard comparison, must_not_regress flag
  for critical Eva queries
- lcm_eval_run: per-run rows with separate retrieval_recall_score AND
  synthesis_quality_score, ensemble judge_models JSON, noise_floor_sd
  for drift calibration, trigger CHECK constraint
- lcm_eval_drift: cumulative-delta drift index per query_set

All cascade via FK on query_set_id deletion.

Verified:
- 52 files / 915 tests passing (+8 from A.04, zero regressions)
- Live DB copy: 15 → 19 lcm_ tables. 3774 leaves preserved.
…ions + procedures + intentions (A.06)

Per v4.1 §7 + v4.1.1 B5/B6/B7/B8/B11. Five tables for the extraction
layer (entity coreference + procedures + intentions tracking).

Tables (all idempotent, dependency-ordered):
- lcm_entity_type_registry: freeform entity_type catalogue (Eva domain
  has session_key, config_flag, R-XXX agent IDs, error_code, etc. —
  no closed CHECK enum, per v4.1.1 §C).
- lcm_entities: simplified schema (no separate aliases table per
  v4.1.1 B5; alternate surface forms denormalized into JSON column).
  UNIQUE index (session_key, canonical_text COLLATE NOCASE) enables
  case-insensitive cross-process single-flight (B4 pattern). FK to
  summaries(first_seen_in_summary_id) ON DELETE SET NULL.
- lcm_entity_mentions: tracks each mention site. CASCADE on both
  entity_id and summary_id deletion (basis for v4.1.1 §C suppression
  cascade — when leaf gets suppressed, mentions cascade-delete).
- lcm_procedures: status lifecycle ('draft'|'active'|'stale'|
  'archived'|'deprecated'); extraction_source distinguishes auto
  (clustering pipeline) from 'manual' (lcm_remember_procedure tool,
  v4.1.1 B8 fix for one-shot procedures).
- lcm_intentions: 3 statuses ('pending'|'fulfilled'|'cancelled' per
  B11); resolution_text + resolved_at columns for capture context.
  source_leaf_id is NULL-allowed since ON DELETE SET NULL requires it.

Verified:
- 53 files / 929 tests passing (+14 from A.05, zero regressions)
- All 5 tables created, FK + CHECK constraints enforced.
….07)

Per v4.1 §1 + v4.1.1 A5/A7. The MANAGED tables only — vec0 virtual
table itself defers to Group B (requires sqlite-vec extension load,
best-effort per A7's two-transaction pattern).

- lcm_embedding_profile: model registry (model_name PK, dim, active flag,
  archive_after for graceful retirement). Group B startup seeds
  voyage-4-large after successful sqlite-vec load.
- lcm_embedding_meta: sidecar with composite PK
  (embedded_id, embedded_kind, embedding_model) enabling parallel rows
  during model-bump cutover. CHECK on embedded_kind ('summary' | 'entity'
  | 'theme'). FK to lcm_embedding_profile prevents orphan model refs.
  No FK on embedded_id — polymorphic per v4.1.1 §C item; orphan cleanup
  via idle pass in Group B.

Verified:
- 54 files / 934 tests passing (+5 from A.06, zero regressions)
…4.1 read patterns (A.08)

Per v4.1 — adds 5 partial/composite indexes that the new retrieval +
suppression + idle-rebuild paths need. All CREATE INDEX IF NOT EXISTS,
all idempotent, all conditional on the v4.1 columns added by A.02.

Indexes:
- summaries_session_key_kind_latest_idx: cross-conv assemble + retrieval
  scope filter. Partial WHERE session_key != '' (skips pre-A.09 backfill
  rows so the index stays compact during the cleanup window).
- summaries_suppressed_idx: WHERE suppressed_at IS NOT NULL — small
  footprint partial index for the suppression filter on every retrieval.
- summaries_contains_suppressed_idx: WHERE contains_suppressed_leaves = 1
  AND superseded_by IS NULL — §8.1 idle-rebuild candidate scan.
- messages_suppressed_idx: WHERE suppressed_at IS NOT NULL — for
  lcm_quote / lcm_factcheck filtering.
- conversations_session_key_v41_idx: WHERE session_key IS NOT NULL —
  boosts the cross-conv JOIN path that legacy:conv_<id> session_keys use
  (existing conversations_session_key_active_created_idx is on the active
  flag too, which legacy convs don't satisfy).

Verified:
- 55 files / 942 tests passing (+7 from A.07, zero regressions)
…lowup)

The optimizer picks full table scan for tiny test datasets (3 rows), not
the new index — that's the right query plan for that data size, just
not what the test asserted. Index PRESENCE verification (the other 6
tests in this file) covers what unit tests can; index USE in production
data shape is verified by A.09's live-DB run-script.
…JOIN backfill (A.09)

Per v4.1 §2.1 (universal cleanup; per-user re-keying like Eva's
5-legacy-convs → agent:main:main is OPERATOR-DRIVEN via Group F's
`/lcm reconcile-session-keys`, NOT hardcoded into upstream migration).

Three idempotent migration steps:

1. backfillConversationSessionKeys: every NULL conversations.session_key
   gets backfilled to 'legacy:conv_<id>'. Each re-key writes a row to
   lcm_session_key_audit (deterministic audit_id derived from conv_id
   ensures idempotent re-runs don't duplicate audit rows). Closes
   v4.1.1 A5 (NULL collapse to empty bucket would destroy cross-conv
   identity for legacy data).

2. backfillSummarySessionKeys: every summary still at the A.02 default
   session_key='' gets backfilled from the parent conversation via
   JOIN. After step 1 ran, conversations.session_key is non-NULL for
   all rows. Idempotent: condition is WHERE session_key = '' so already-
   set rows are preserved.

3. backfillForkRollupsSessionKeys: forward-compat for Eva's fork-side
   lcm_rollups table (created by PR Martian-Engineering#516, not in upstream src). Only
   touches the table if it exists AND has session_key column. No-op on
   fresh upstream installs.

Verified on copy of Eva's live DB (/Volumes/LEXAR/lcm-tmp/lcm-test.db):
  PRE: 762 convs, 522 NULL session_keys, 4 agent:main:main, 0 legacy:
  POST: 762 convs, 0 NULL, 4 agent:main:main preserved, 522 legacy:conv_*
        4187 summary session_key backfills (all summaries now keyed)
        522 audit rows recorded
        5 legacy convs identified as having leaves (target for Eva's
        future `/lcm reconcile-session-keys` to merge into agent:main:main)

- 56 files / 947 tests passing (+6 from A.08, zero regressions)
… (A.10)

Per v4.1 §2.2 — fixes the leaf-summarizer cap bug.

The empirical-spike-agent found 543 leaves on Eva's live DB pegged at
exactly 2,415 tokens (the LLM hitting the old 2400 default and
producing artificially-truncated summaries). This commit raises the
default in two places that share the constant:

- src/summarize.ts:50 DEFAULT_LEAF_TARGET_TOKENS: 2400 → 4000
- src/db/config.ts:464 fallback default for pc.leafTargetTokens: 2400 → 4000

Comment added to both locations citing the empirical finding so future
readers see the rationale.

Voyage embedding (Group B) supports 32K input context, so 4000-token
leaves are well within budget. Average leaf on Eva's corpus is 1,167
tokens (most leaves don't approach the cap); the change only affects
leaves where the source content is dense enough to need it.

Existing 543 capped leaves on Eva's DB stay as-is — regenerating them
from source messages is expensive (LLM calls) and is operator-driven,
not a migration step. Leaves are immutable per v3 design principle 4.

Tests:
- test/v41-leaf-cap.test.ts (NEW, 3 tests): verifies new constant +
  rationale comment present
- test/config.test.ts: updated existing assertion 2400 → 4000

950/950 tests passing.
Raw fetch wrapper for Voyage AI. We do NOT use the voyageai npm SDK:
v0.2.1 has an ESM resolution bug confirmed during Phase A spike (see
docs/projects/lcm-rollup-overhaul/voyage-spike-results.md).

Two entry points: embedTexts() and rerankCandidates(). Both:
  - Send `truncation: false` so over-cap docs are surfaced as 400 errors
    rather than silently clipped (lossless invariant — a truncated
    embedding produces a vector that doesn't reflect the source, with
    no signal in the vector itself that anything was dropped).
  - Throw typed VoyageError on every failure mode (auth/bad_request/
    rate_limit/server_error/network/unexpected) so callers can react
    appropriately. Backfill cron will use `kind` to decide whether to
    park, requeue, or surface to operator.
  - Retry on 5xx + network errors with exponential backoff (capped 30s).
    NOT on 4xx (caller bug — retrying just spends quota).
  - Honor Retry-After header on 429 (seconds OR HTTP-date).
  - Support mock fetch injection for tests — no module-level state,
    no globals, no live API calls in CI.

Token budget constants exported for callers:
  - MAX_TOKENS_PER_EMBED_BATCH = 80K (Voyage caps at 120K, tokenizer
    counts ~9.5% higher than our token_count, so 80K leaves margin).
  - MAX_TOKENS_PER_EMBED_DOC = 30K (voyage-4-large per-doc cap is 32K).
  - MAX_TOKENS_PER_RERANK_CALL = 600K (rerank-2.5 per-call total).

Privacy: error messages strip Voyage-echoed input from 400 responses
(some Voyage 400s include the input verbatim — could leak PII to logs
that aren't supposed to see it). Raw responseBody preserved on the
VoyageError for callers that need it.

Coverage: 22 tests, all mock fetch:
  - embed happy path (input_type, ordering, empty input, truncation flag)
  - rerank happy path (top_k, sorting, id join)
  - all 6 error kinds + retry behavior
  - VOYAGE_API_KEY env var resolution

Resolves: foundation for v4.1 §13 (embedding generation + reranking).
Next (B.02): per-model vec0 table creation.
…(B.02)

Centralizes all sqlite-vec interaction in src/embeddings/store.ts. Callers
never touch vec0 SQL directly. Reasons documented in module header, but
short version:

  1. sqlite-vec is best-effort. tryLoadSqliteVec() searches candidate
     paths (env, plugin node_modules, ~/.openclaw/extensions) and returns
     boolean. If false, the rest of LCM still works (FTS-only retrieval).
     Aligned with v4.1.1 A7 graceful-degrade amendment.

  2. vec0 has class-of-column quirks that bite: INTEGER metadata cols
     reject JS number literals (need BigInt at the binding site), and
     auxiliary cols throw "illegal WHERE constraint" if filtered inside
     MATCH queries. Schema choice:

       embedding float[<dim>]      -- the vector
       +embedded_id text           -- AUX (never WHERE-filtered)
       embedded_kind text          -- METADATA (filterable in MATCH)
       suppressed integer          -- METADATA (filterable in MATCH)

     Empirically verified: WHERE on +embedded_kind crashes vec0; WHERE
     on plain `embedded_kind text` (metadata) works. Centralizing this
     here so future code can't accidentally pick wrong column class.

  3. Profile dim is immutable. registerEmbeddingProfile() throws on
     mismatch. To switch dim, bump the model name (e.g. add a suffix)
     and run cutover — never silently change dim of an existing profile.

API surface:
  - tryLoadSqliteVec(db, opts) → boolean
  - vec0Version(db) → "v0.1.9" | null
  - candidateVec0Paths() → string[] (for diagnostics)
  - embeddingsTableName(modelName) → "lcm_embeddings_<slug>"
  - embeddingsTableExists(db, modelName) → boolean
  - registerEmbeddingProfile(db, modelName, dim)
  - ensureEmbeddingsTable(db, modelName, dim)
  - recordEmbedding(db, {modelName, embeddedId, embeddedKind, vector,
      suppressed?, sourceTokenCount}) — vec0 INSERT + meta UPSERT
  - replaceEmbedding(...) — DELETE-then-INSERT (for re-embed)
  - deleteEmbedding(...) — for purge cascade
  - markEmbeddingSuppressed(...) — UPDATE metadata (works on metadata
      cols; would corrupt if used on PARTITION KEY per v4.1.1 finding)
  - searchSimilar(db, {modelName, queryVector, k, embeddedKinds,
      excludeSuppressed}) — KNN with default exclude-suppressed
  - isEmbedded(db, {embeddedId, embeddedKind, modelName}) → boolean

Coverage: 28 tests
  - 15 always-on: name validation, candidate paths, graceful degrade,
    profile registration with dim mismatch / bad-input rejection
  - 13 vec0-gated: load extension, ensure table, record/replace/delete
    embedding, KNN with kind filter, KNN with suppression, mark
    suppressed flips visibility, two independent models per DB

The vec0-gated suite uses LCM_TEST_VEC0_PATH env var override (or
defaults to /Users/lume/.openclaw/... on dev). vitest.config.ts
overrides $HOME so homedir() inside tests doesn't see the dev install
— this gate accommodates that.

Build: dist/index.js = 708.4kb (was 708.4kb pre-B.02 — empty plugin
import boundary, store module is tree-shaken from index.ts which doesn't
import it yet; gateway picks up via Group B.05 leaf-time embed wire-up).

Tests: 1000 passing (was 972 before B.02; +28 new).

Resolves: foundation for v4.1 §13 (vec0 storage layer).
Next (B.03): AFTER DELETE TRIGGER on summaries → cascades suppression
+ deletion into vec0 (since FK from vec0 → summaries corrupts vec0).
…B.03)

Three new SQLite triggers, each with a specific job:

  1. Per-model `lcm_embed_suppress_<slug>` (in src/embeddings/store.ts):
     AFTER UPDATE OF suppressed_at ON summaries
     WHEN (NEW.suppressed_at IS NULL) != (OLD.suppressed_at IS NULL)
     → mirrors the NULL-vs-not transition into vec0.suppressed metadata
       column for the corresponding embedded_id (kind='summary').

     Why a trigger: suppression can be set from any path — operator's
     /lcm purge, agent tool, manual SQL, future migration cleanup. A
     trigger guarantees the cascade by-DB rather than by-convention.

     Why metadata col + WHEN clause: the trigger fires only on actual
     transitions, not on every other UPDATE; vec0 metadata column is
     pre-filterable in KNN MATCH queries (auxiliary cols throw "illegal
     WHERE constraint" — verified empirically).

  2. Per-model `lcm_embed_delete_<slug>` (in src/embeddings/store.ts):
     AFTER DELETE ON summaries
     → DELETE matching vec0 row.

     Why a trigger and not FK CASCADE: vec0 corrupts under FK
     (v4.1.1 finding from upstream review). Trigger is the only safe
     path to keep vec0 + summaries in sync on hard-delete.

  3. Shared `lcm_embedding_meta_cleanup_summary` (in src/db/migration.ts):
     AFTER DELETE ON summaries
     → DELETE matching lcm_embedding_meta row WHERE kind='summary'.

     Why this is in migration not store: lcm_embedding_meta exists once
     regardless of how many vec0 model tables exist (it's a cross-model
     sidecar). The kind='summary' filter prevents accidental cleanup of
     polymorphic entity/theme rows. Entity/theme cleanup triggers will
     land in Groups E/G when those embeddings ship.

Per-model triggers are created idempotently when ensureEmbeddingsTable
is called for a model. dropEmbeddingsTriggers() is exported for the
model-archival cutover path (Group F operator surface).

Coverage: 9 new tests (3 always-on, 6 vec0-gated):
  - meta-table cleanup trigger only deletes kind='summary' (entity row
    untouched)
  - meta cleanup trigger is idempotent across re-migration
  - suppression cascade NULL → not-NULL hides row from KNN
  - un-suppression cascade not-NULL → NULL restores visibility
  - WHEN clause skips no-op transitions (NULL → NULL, or content updates)
  - delete cascade removes vec0 row + meta row
  - two-model setup: cleanup hits both vec0 tables
  - dropEmbeddingsTriggers stops cascade firing
  - re-creating triggers is idempotent

Live-DB verification: copied Eva's lcm.db (4187 summaries, 762
conversations) to /Volumes/LEXAR; migration completes in 3.9s; meta
cleanup trigger created cleanly.

Tests: 1009 passing (was 1000 before B.03; +9 new).

Resolves: v4.1 §10 suppression cascade for vec0 retrieval surfaces.
Next (B.fix): fold Group A adversarial-pass fixes (Gap 2 NULL UNIQUE
on lcm_prompt_registry; Gap 7 wire concurrency assertions; Gap 9 add
live-DB regression test).
Resolves Gaps 2, 7, 9 from the Group A adversarial code review:

Gap 2 (MED) — lcm_prompt_registry NULL tier_label deduplication.
SQLite treats multiple NULL values as distinct in UNIQUE constraints, so
the original UNIQUE(memory_type, tier_label, pass_kind, version) admits
duplicate rows when tier_label IS NULL. The synthesis spec requires
singletons-per-version, so add a follow-up migration step
(ensureLcmPromptRegistryNullSafeUniqueIdx) that creates a COALESCE-based
UNIQUE INDEX. Same pattern is already used for
lcm_synthesis_cache_lookup_uniq. The original UNIQUE constraint stays
(catches non-NULL collisions); the new index catches NULL collisions.

Gap 7 (LOW) — wire assertForeignKeysEnabled into configureConnection.
src/concurrency/model.ts already exports assertForeignKeysEnabled(db) but
nothing in production calls it. Add a call after the existing PRAGMA
foreign_keys = ON in src/db/connection.ts:configureConnection so any
future regression that opens a connection without FK enforcement (which
would silently degrade every ON DELETE CASCADE in the schema) fails fast.
assertBusyTimeoutForRole wiring is intentionally deferred to Group B.05
(worker startup) per the Group A reviewer's recommendation.

Gap 9 (MED) — live-DB-shape regression test.
All other v41-*.test.ts files start from a fresh :memory: and run the full
migration on an empty DB. None tested the migration against a partially
pre-existing schema (where conversations / summaries / messages already
exist with rows but lcm_* tables don't yet). The Eva-live-DB verification
was one-off and not in CI. New test
v41-pre-existing-schema-migration.test.ts seeds the upstream pre-v4.1
baseline shape, inserts conversations + summaries + messages, runs
runLcmMigrations, and verifies: NULL session_keys are backfilled, audit
rows exist, summaries.session_key is JOIN-backfilled, all 21 v4.1 tables
exist, the new lcm_prompt_registry_uniq_lookup index exists, and re-runs
are idempotent.
Helper module on top of A.01's lcm_worker_lock table. Acquisition is
atomic via PRIMARY KEY uniqueness on (job_kind) — INSERT OR IGNORE
returns 1 if we got it, 0 if someone else holds it.

API:
  - acquireLock(db, jobKind, {workerId, ttlMs?, jobSessionKey?, jobMetadata?})
    → boolean. GC's expired locks BEFORE acquiring (≤ datetime('now')
    so ttl=0 is immediately reclaimable; race-safe via INSERT OR IGNORE).
  - releaseLock(db, jobKind, workerId) → boolean. Only frees if the
    workerId matches (prevents accidental cross-worker release).
  - heartbeatLock(db, jobKind, workerId, ttlMs?) → boolean. Updates
    expires_at + last_heartbeat_at. Returns false if the lock was
    preempted (caller MUST abort to avoid double-processing).
  - lockInfo(db, jobKind) → LockInfo | null. Used by /lcm health.
  - generateWorkerId(role) → string. Format `<role>-<pid>-<ms>-<6hex>`.

Used by Group B.04 backfill cron (next commit) and Groups E (extraction)
+ G (themes consolidation) + worker scaffolding (B.05).

Coverage: 13 tests (single-process acquire/release, TTL+GC behavior,
heartbeat semantics including preemption-detection, metadata round-trip,
multi-kind isolation, generateWorkerId uniqueness).

Tests: 1017 → 1030 (+13).

Resolves: §0 cross-process lock primitive used by all worker jobs.
Next (B.04b): backfill cron module that uses these primitives.
…(E.spike)

Wraps ml-hclust (mljs ecosystem) for use by Group E procedure clustering.

Library choice rationale (full notes in module header):
  - ESM-native (this plugin ships ESM only)
  - MIT licensed, actively maintained (v4.0.0 published 2025-11-26)
  - Small footprint (~48KB unpacked); esbuild tree-shakes most transitive
    deps. Bundle delta: 708.7kb → 709.4kb (+0.7KB; index.ts doesn't import
    yet — Group E will pull it in)
  - Accepts precomputed distance matrix (we pass cosine distance), so we
    can do Ward+cosine without hacking the lib's internal euclidean
  - Cluster.cut(height) AND Cluster.group(K) both supported, satisfying
    both "let dendrogram decide" and "force K" use cases

Architecture choice notes:
  - Ward + cosine on precomputed matrix: same approximation scipy gives
    you (linkage(method="ward", metric="cosine")). Mathematically loose
    (Ward assumes squared Euclidean) but conventional for text embeddings.
    Fallback method: "average" (UPGMA) — no Euclidean assumption — if
    empirical eval shows wonky merges.
  - Pre-normalize each vector once → cosine distance becomes (1 - dot).
    Halves the inner-loop cost and centralizes float-drift clamping.
  - O(N^2 D) distance build + O(N^3) agnes. For N=2000 D=1024 that's
    ~few seconds in JS — comfortably within the worker-process budget.

Alternatives considered + rejected:
  - hierarchical-clustering-js: 404 on npm
  - density-clustering: wrong algorithm family (DBSCAN/k-means only)
  - clusterfck: deprecated
  - clustering-js: abandoned

API:
  - clusterHierarchical({vectors, cutHeight?, numClusters?}) → ClusterResult

Coverage: 11 tests
  - empty input, single vector, identical vectors, separable groups
  - force-K mode, mixed-dim rejection, non-Float32Array rejection,
    cutHeight validation, internal coverage check
  - 100-vector perf sanity (<2s)

Built (subagent: a1e8a944580405a69) — research + library survey done in
parallel with Group B.04 work; spec checked + tests verified before
committing.

Tests: 1030 → 1041 (+11).

Resolves: foundation for Group E procedure clustering. Group E will:
  (1) pre-filter leaves (structural — numbered steps / commands /
       explicit "how to" markers, NOT FTS verb regex)
  (2) call clusterHierarchical() over voyage-4-large embeddings
  (3) filter to clusters with ≥8 members + LLM-judge confidence > 0.9
  (4) write to lcm_procedures with status='active'
…idempotent (B.04b)

Walks unembedded leaves, batches by token budget, calls Voyage, writes
vec0 + meta. Designed as a single-tick API: caller (worker scheduler)
invokes once per tick; the function acquires lcm_worker_lock, processes
up to perTickLimit documents, releases lock, returns BackfillResult.

API:
  - runBackfillTick(db, opts) → Promise<BackfillResult>
  - countPendingDocs(db, args) → number  (for /lcm health and tick-scheduling)

BackfillOptions covers: model + Voyage model dispatch, input_type
(MUST be 'document' for backfill), API key + mock fetch, RPS pacing
(default 0.5 = one call per 2s), batch token cap (default 80K),
per-tick doc cap (default 200), token-count min/max (default 1 .. 30K),
worker_id override (for stable IDs across ticks), onBatchComplete hook
for telemetry, skipLock for tests.

BackfillResult tracks: embeddedCount, skippedOverCap (rows above the
30K cap, requiring operator attention), skipped[] (per-row failures
with kind='voyage_400'/'voyage_other'/'over_cap'), perTickLimitReached
(scheduler reschedules if true), lockNotAcquired (scheduler skips this
tick), voyageTokensConsumed (API usage telemetry), durationMs.

Invariants:
  1. NO LLM/network in any DB write tx. Each Voyage HTTP call lives
     OUTSIDE the per-batch transaction; rate-state UPDATE (when added
     in B.04c follow-up) will be a brief BEGIN IMMEDIATE that COMMITs
     before the HTTP call (never holds a write lock through HTTP latency).
  2. Single-flight via worker lock — gateway-fallback safe.
  3. Resumable — each batch's writes commit independently. Crash
     mid-tick loses one in-flight batch worth of Voyage spend at most.
     Next tick picks up still-unembedded rows.
  4. Idempotent on per-row basis. SELECT pre-filters rows that already
     have a non-archived `lcm_embedding_meta` entry; a duplicate-write
     would just be a no-op via INSERT OR REPLACE.
  5. Suppression-aware: rows where `summaries.suppressed_at IS NOT NULL`
     are excluded.
  6. Per-tick failure blocklist — failed_summary_ids set excludes them
     from subsequent SELECTs within the same tick. Next tick re-attempts
     (Voyage may have recovered). Without this, a persistent 400 would
     spin the loop until perTickLimit.
  7. Auth errors are FATAL — re-thrown so the operator gets surfaced.
     Still releases the lock via try/finally.

Heartbeat: lock heartbeat fires every batch. If preempted (heartbeat
returns false), tick aborts cleanly without partial state.

Coverage: 13 tests (all vec0-gated, mock fetch — NO live API):
  - basic embed-all, isEmbedded reflects state
  - skip suppressed leaves (no Voyage call for them)
  - idempotent on second tick (zero new Voyage calls)
  - over-cap leaves filtered at SELECT (countPendingDocs verifies)
  - perTickLimit caps work + perTickLimitReached flag
  - 400 records skipped doc, no abort
  - 401 (auth) re-thrown, lock released via finally
  - 500 records skipped, continues with other batches
  - lockNotAcquired when another worker holds (no Voyage call)
  - lock released on success
  - lock released even on auth error
  - batches packed to maxBatchTokens (greedy bin-pack)
  - countPendingDocs accurate

Tests: 1041 → 1054 (+13).

Resolves: foundation for v4.1 §13 backfill — first-run embedding of
existing summaries on Eva's live DB. Group B.05 (next) wires async
leaf-time embed for new leaves so the cron only handles backfill of
the 4187-row corpus, not new ongoing leaves.
….05)

Two pieces, both foundation for Group F's `/lcm worker` operator surface
(later) and to close Group A adversarial-review Gap 8.

## 1. Worker loop (src/concurrency/worker-loop.ts)

Generic single-process worker loop. One Node process running multiple
background jobs cooperatively, single-threaded, each with its own
cadence. Cross-process safety via lcm_worker_lock from B.04a.

API:
  - new WorkerLoop(db, {jobs: WorkerJob[], onJobComplete?})
  - loop.start() → idempotent, schedules setInterval per job
  - loop.stop({gracefulTimeoutMs?: 30000}) → waits for in-flight ticks
  - loop.runOnce(kind) → outside-schedule manual tick (used by leaf-write
    hooks to nudge backfill, and by `/lcm worker tick` operator command)
  - loop.isRunning() / loop.inFlightCount() — for /lcm health

Design choices:
  - setInterval (not setTimeout chain): predictable cadence, dispatcher
    skips overlapping ticks rather than queuing — extra ticks lose, not
    queued forever.
  - Errors in jobs captured via onJobComplete, never propagate to loop —
    one bad tick doesn't crash the worker.
  - generationId guard: stop()-then-start() doesn't run leftover ticks
    from the old loop.
  - validateJobs() at construction: duplicate kinds + invalid intervalMs
    rejected up-front (programmer error).

NOT yet wired into plugin lifecycle. Group F's /lcm worker [start|stop]
operator command will instantiate it with the actual job list. Until
then, the loop is a library — the embedding store + backfill modules
are usable standalone.

NOT using worker_threads. v4.1.1 A9 foresees true heartbeat-isolation
via worker_threads, but that's a future commit. setInterval-driven
dispatch is fine for our cadences (5-60s).

## 2. Leaf-write session_key fix (Gap 8 from Group A adversarial review)

src/store/summary-store.ts:411 — INSERT INTO summaries now atomically
populates session_key from a sub-SELECT of conversations.session_key.
Closes the gap where new summaries inserted between gateway boots had
session_key='' until next boot's JOIN-backfill ran. The COALESCE
defends against (theoretically impossible) NULL conversations.session_key.

This means every newly-written summary IMMEDIATELY participates in
session_key-filtered partial indexes (summaries_session_key_kind_latest_idx
from A.08), without waiting for migration boot.

All 1054 existing tests still pass — change is additive (default still
'' if conversation has no session_key, but the migration ensures every
conv has one).

Coverage: 13 new worker-loop tests
  - start/stop idempotency
  - schedules at cadence (timing-based)
  - two jobs with different intervals
  - overlapping ticks skipped (not queued)
  - errors in jobs captured + loop continues
  - graceful stop waits for in-flight
  - graceful stop returns false on timeout
  - runOnce returns result, throws on unknown kind, throws on in-flight
  - validates duplicate kinds + bad intervalMs

Tests: 1054 → 1067 (+13).

Resolves: foundation for v4.1 §0 worker scheduling + Group A Gap 8.
Group B is now complete (B.01 Voyage client, B.02 vec0, B.03 cascade
triggers, B.fix polish, B.04a worker-lock, B.04b backfill cron, B.05
worker loop + session_key fix). Next: Group B adversarial pass, then
Group C retrieval (hybrid lcm_grep, lcm_semantic_recall).
… join (C.01)

Wraps the embed-query → vec0 KNN → JOIN-back-to-summaries flow used by
both `lcm_semantic_recall` (Group C) AND the hybrid mode of `lcm_grep`
(C.02). Centralizing here so the two callers can't drift on suppression
semantics, kind filtering, or session-key scope.

API:
  - getActiveEmbeddingModel(db) → {modelName, dim} | null
    Picks active=1 + archive_after IS NULL row, most-recent registered_at
    on ties (handles model-cutover gracefully).
  - runSemanticSearch(db, opts) → Promise<SemanticSearchResult>
    Throws SemanticSearchUnavailableError if vec0 not loaded OR no
    active profile OR vec0 table missing — caller decides whether to
    degrade (FTS-only) or surface error.

SemanticSearchOptions covers: query (text) OR queryVector (precomputed),
session_keys / conversation_ids / since / before / summary_kinds filters,
embedded_kinds default ['summary'], excludeSuppressed default true,
all Voyage knobs (apiKey/fetch/maxRetries/inputType — default 'query'
for asymmetric retrieval).

Suppression filtered at TWO layers (defense in depth — race between
trigger fire and KNN call could leak a stale row through metadata):
  1. vec0 metadata `suppressed = 0` pre-filter inside MATCH
  2. Final JOIN to summaries WHERE `suppressed_at IS NULL`

session_key scope uses the column populated atomically at write time
per Group A Gap 8 fix (in B.05). conversation_id, time, and kind
filters all bind via parameterized SQL — no injection vectors.

Coverage: 15 tests
  - getActiveEmbeddingModel: null when no profile, picks active+
    most-recent, excludes archived
  - SemanticSearchUnavailableError when vec0 not loaded / no profile
  - input validation: requires query OR queryVector; dim mismatch
  - happy path: ranked hits, joined content + metadata
  - suppression filter (default + opt-in to include)
  - session_keys filter restricts to matching sessions
  - conversation_ids filter restricts to matching conversations
  - since/before time filter
  - Voyage call with input_type='query' verified, voyageTokensConsumed
    tracked
  - summary_kinds filter (leaf vs condensed)

Tests: 1067 → 1082 (+15).

Resolves: foundation for v4.1 §13 retrieval pipeline. Next (C.02):
new lcm_semantic_recall tool + hybrid mode for lcm_grep that calls
this service alongside FTS and merges with Voyage rerank-2.5.
…rank (C.02a)

Combines FTS5 candidates with vec0 KNN candidates, deduplicates by
summary_id, then either:
  - Reranks via Voyage rerank-2.5 (default) — produces final relevance
    scoring across the union, taking advantage of the spike-validated
    +52.5pp lift on paraphrastic queries
  - OR reciprocal-rank-fusion (RRF) when rerank=false OR when Voyage
    rerank fails (transient 5xx; auth re-thrown for operator surfacing)

API:
  - runHybridSearch(db, opts) → Promise<HybridSearchResult>

opts: query, kFts (default 50), kSemantic (default 50), topN (default
20), filters (sessionKeys/conversationIds/since/before/summaryKinds),
excludeSuppressed default true, rerank default true, voyage HTTP knobs.
Caller injects ftsSearch() so this module doesn't take ownership of FTS5
sanitization or hybrid-recency sort logic — that lives in the existing
SummaryStore/RetrievalEngine path.

HybridHit returned with:
  - {summaryId, conversationId, sessionKey, kind, content, tokenCount, createdAt}
  - score (rerank score OR RRF score)
  - fromFts / fromSemantic provenance flags
  - semanticDistance (cosine), ftsRank — for diagnostics + caller display

Graceful degrade:
  - vec0 not loaded → degradedToFtsOnly=true, FTS-only result
  - rerank 5xx → degradedSkippedRerank=true, RRF fallback
  - rerank 401 (auth) → re-thrown; operator must fix API key
  - empty query → throws (programmer error)

Suppression: both FTS-side and semantic-side default to excludeSuppressed.
Rerank input is post-suppression union, so no post-rerank filter needed.

NOT YET WIRED into lcm_grep tool. Next commit (C.02b) extends the tool
with mode='hybrid' that calls runHybridSearch with summaryStore.searchSummaries
adapted to FtsHit shape.

Coverage: 8 tests (vec0-gated, mock fetch — NO live API):
  - merges FTS + semantic, rerank produces top-N
  - dedupe overlap (FTS + semantic both find same doc)
  - vec0 unavailable → FTS-only with degraded flag
  - rerank 500 → RRF fallback with degraded flag
  - rerank 401 → re-thrown
  - rerank=false explicit → RRF mode, no Voyage rerank call
  - empty query rejected
  - no candidates → empty hits

Tests: 1082 → 1090 (+8).

Resolves: foundation for hybrid retrieval. Used by C.02b (lcm_grep
mode='hybrid') AND C.04 (lcm_synthesize_around window_kind='semantic').
…paths (C.03)

v4.1 §10 invariant: every agent-facing retrieval surface defaults to
exclude-suppressed. Adds `WHERE suppressed_at IS NULL` to four search
code paths in SummaryStore:

  1. searchFullText (FTS5 path)        — alias `s.suppressed_at IS NULL`
  2. searchLike (LIKE-fallback path)   — `suppressed_at IS NULL`
  3. searchCjkTrigram (CJK FTS path)   — alias `s.suppressed_at IS NULL`
  4. searchRegex                       — `suppressed_at IS NULL`

These four functions back the existing `lcm_grep` tool's regex /
full_text modes (and the new C.02b hybrid mode via the ftsSearch
callback). Suppressed leaves now never surface to agents through any
search-side path.

The vec0 retrieval surfaces (semantic-search, hybrid-search) already
filter via metadata pre-filter (vec0 `suppressed=0`) AND defense-in-
depth JOIN to summaries.suppressed_at IS NULL. Both layers are
independently tested.

What this DOESN'T change:
  - getSummary(id), getSummaryParents/Children/Subtree, getSummaryMessages,
    context-item reads — these are structural lookups used by lineage /
    expansion / assembler. The architecture's "7 read paths" cascade
    handles them by suppressing-at-source (assembler builds context
    from latest non-suppressed leaves; expansion respects
    contains_suppressed_leaves flag for condensed). A per-method
    excludeSuppressed default param refactor was considered but deferred.
  - lcm-doctor / lcm-command operator paths — operator tooling
    intentionally sees ALL rows including suppressed (for cleanup,
    audit, doctor checks).

Coverage: 4 new tests (LIKE/full_text path, regex path, restore-on-
unsuppress, multiple-suppression).

Tests: 1090 → 1094 (+4).

Resolves: v4.1 §10 invariant for SummaryStore search paths.
Wires the semantic-search service from src/embeddings/ into a new
agent-callable tool. lcm_semantic_recall is the purely-semantic
counterpart to lcm_grep; agents use it for paraphrastic queries that
exact-match FTS would miss. Hybrid (keyword + semantic) is reserved for
lcm_grep mode='hybrid' (Group C.02b).

The tool resolves conversation scope via the existing
resolveLcmConversationScope helper, parses since/before like lcm_grep,
and gracefully degrades when sqlite-vec is missing or when
VOYAGE_API_KEY is not set — both surfaces return jsonResult errors that
direct the agent back to lcm_grep instead of throwing.

A small public getDb() accessor is added to LcmContextEngine so tools
can call runSemanticSearch(db, opts) directly without plumbing a new
dependency through the LcmDependencies surface. Mirrors the existing
getRetrieval() / getConversationStore() / getSummaryStore() pattern.

Manifest contracts.tools updated to match the new register call site
(guarded by manifest.test.ts).

Tests cover input validation (empty query, bad timestamps, missing
scope), graceful degradation (vec0 unavailable, missing API key), happy
path with mocked Voyage fetch, conversationId scope filter, and
since/before passthrough — vec0-dependent tests skip cleanly when the
extension isn't installed.

Refs: architecture v4.1 §13.
… collision (B.fix2)

Resolves Group B adversarial-pass HIGH/BLOCKER findings:

## Gap 1 (BLOCKER) — backfill heartbeat vs Voyage retry budget

src/embeddings/backfill.ts: was using Voyage client's default retry +
timeout (3 retries × 60s = ~4 min worst-case per batch). With
WORKER_LOCK_TTL_MS=90s, a stuck batch can let another worker GC the
lock and start backfilling the same docs → Voyage double-bill +
duplicate vec0 rows (auxiliary cols have no UNIQUE constraint to
catch this).

Fix: introduce `voyageMaxRetries` default = 1 + `voyageTimeoutMs`
default = 30s in BackfillOptions. Worst-case per batch now:
  2 attempts × 30s + ~0.5s backoff ≈ 60.5s
Comfortably under 90s lock TTL → another worker can't preempt mid-batch.

Caller can override either knob (e.g. for first-run backfill where
contention is low and longer Voyage tolerance is acceptable). Tests
that need to surface 5xx immediately use voyageMaxRetries: 0.

## Gap 2 (HIGH) — slug collision silently corrupts KNN

src/embeddings/store.ts: registerEmbeddingProfile() didn't check that
the new model_name's sluggified form was already in use. Two profiles
like `voyage-4-large` and `voyage_4_large` both sluggify to
`voyage4large` → same vec0 table → inserts from both profiles route
to one table → KNN cross-contaminates.

Fix: scan existing profiles for slug equality BEFORE INSERT OR IGNORE.
Throws with explanatory message identifying the existing model_name
that already owns the slug.

The existing `MODEL_NAME_PATTERN = /^[A-Za-z0-9._-]{1,64}$/` allows
`-`, `_`, `.` — all of which are stripped by sluggification — so
false-collision risk is real, not hypothetical.

## Gap 8 (LOW, folded in) — dim upper bound consistency

ensureEmbeddingsTable rejects dim > 4096; registerEmbeddingProfile
had no upper bound, leaving an orphaned profile if caller did
register-then-ensure. Aligned both functions to reject dim > 4096
in registerEmbeddingProfile too.

## Coverage: 8 new tests in v41-group-b-fix2.test.ts
  - Slug collision rejected: dash↔underscore↔dot↔case variants
  - Genuinely-different slug allowed
  - Re-registering same model still idempotent
  - Collision detection order-independent
  - Dim > 4096 rejected (matching ensureEmbeddingsTable)
  - Dim = 4096 accepted (boundary)
  - Backfill default voyageMaxRetries=1 (proven by call count = 2)
  - Backfill caller can override voyageMaxRetries: 0

Tests: 1094 → 1112 (+18 — also includes 10 from C.01b subagent).

Group B adversarial Gaps 3-7 (3 MED + 1 LOW remaining) are doc/comment
polish; deferred to cycle-2 review.
Extends lcm_grep with a third mode='hybrid' that blends FTS + semantic
vector search via Voyage rerank. The schema enum picks up the new
value, and the tool description points agents at lcm_semantic_recall
for purely-semantic exploration so the two surfaces stay
distinguishable.

The hybrid path delegates to runHybridSearch (src/embeddings/), passing
a small adapter that wraps summaryStore.searchSummaries(mode:'full_text'
sort:'relevance') and hydrates the snippets back to full FtsHit shape
via a single batched SELECT against summaries by summary_id. We could
have piped each hit through getSummary, but the IN(...) batch is one
round-trip and the values we need (session_key, content, token_count,
created_at, conversation_id) are already on the row.

Output format mirrors the regex/full_text branch — same '## LCM Grep
Results' header, '**Mode:** hybrid' line, conversation scope + time
filter — but with hybrid-specific extras:

  - per-hit provenance flag: [from FTS+semantic] / [from FTS only] /
    [from semantic only]
  - rerank/RRF score
  - degraded warnings: '*(semantic search unavailable; degraded to
    FTS-only)*' when vec0 is missing, '*(rerank failed; using RRF
    fusion fallback)*' when rerank network errors and we fall back to
    reciprocal-rank-fusion

Auth errors from Voyage surface as a jsonResult error message that
points the agent at mode='full_text' as the keyword-only fallback.

Tests cover schema enum + description metadata, the
degraded-vec0-missing path (FTS-only mode with the warning + FTS-only
provenance flag), happy path with mocked Voyage embed + rerank (mixed
provenance flags + score-ordered hits), and the rerank-failed RRF
fallback path.

Refs: architecture v4.1 §13.
Versioned prompt templates per (memory_type, tier_label, pass_kind).
Append-only — old versions stay archived (active=0); new versions
inserted with active=1, previous-active row deactivated atomically.

Backed by lcm_prompt_registry (created in A.04, NULL-tier UNIQUE
patched in B.fix Gap 2). Schema:
  (prompt_id PK, memory_type, tier_label NULLABLE, pass_kind, version,
   template, model_recommendation, active, bundle_version, notes)

API:
  - getActivePrompt(db, {memoryType, tierLabel, passKind}) → PromptRecord | null
  - getPromptById(db, promptId) → PromptRecord | null
    (used by synthesis-cache to verify the prompt_id is still current
    or look up the archived version that was used)
  - registerPrompt(db, opts) → string (the new prompt_id)
    Atomic: deactivates previous + inserts new in BEGIN IMMEDIATE.
    Auto-versions (max(version) + 1 within triple).
  - listActivePrompts(db) → for /lcm health
  - bumpBundleVersion(db) → for voice-consistency rebuilds

NULL tierLabel handling: matched literally (not coerced to "")  in
both lookup and update. Aligns with B.fix Gap 2's NULL-safe UNIQUE
index on (memory_type, COALESCE(tier_label, ''), pass_kind, version) —
the registry treats NULL and '' as DIFFERENT for purposes of routing,
even though the UNIQUE index treats them as the same for collision
detection.

Why versioning matters for cache invalidation: lcm_synthesis_cache
(D.02 next commit) will FK on prompt_id. When a prompt is updated:
  - Old cache entries reference the now-archived prompt_id → stale
  - New synthesis calls write rows with the new prompt_id → fresh
  - Cache invalidation can be SELECTIVE (only entries with archived
    prompt_id need rebuild) — never touches durable summaries.content

Coverage: 11 tests
  - register + getActivePrompt happy path
  - re-register same triple deactivates previous + bumps version
  - per-triple version isolation (different triples independent)
  - NULL tierLabel matched literally
  - getActivePrompt returns null when none registered
  - promptIdOverride respected
  - modelRecommendation/bundleVersion/notes round-trip
  - listActivePrompts excludes archived
  - bumpBundleVersion increments active prompts only
  - atomic transaction rolls back on PK collision

Tests: 1112 → 1123 (+11).

Resolves: foundation for v4.1 §3 synthesis. Next (D.02): synthesis
dispatch that uses this registry for prompt selection.
Extends the lcm_describe summary payload with two fields agents need
when reasoning across session families:

  - sessionKey: pulled from the parent conversations row (which holds
    the same value as summaries.session_key per the Gap 8 / B.05
    atomic-write invariant). The SummaryRecord public store API
    doesn't carry session_key through, so retrieval.describeSummary()
    fans out a parallel conversationStore.getConversation(conversationId)
    alongside the existing parents/children/messages/subtree fetches.
    Empty string when the parent conversation has no session_key.

  - timeRange: a normalized {earliestAt, latestAt, createdAt} struct
    that mirrors the three time fields already present on the summary.
    Convenience for callers that prefer one bracket over three siblings.

Both fields are also surfaced in the text rendering — the meta line
now carries 'sessionKey=...' and 'created=...' alongside the existing
'range=earliest..latest', so agents inspecting summaries get the
session affiliation and creation time visible without parsing the
JSON details.

Tests cover both the populated path (sessionKey appears verbatim,
timeRange struct round-trips through details) and the empty path
(sessionKey rendered as '-' for missing values).

Refs: architecture v4.1 §13.
…D.02)

Per-tier dispatch on top of D.01's prompt registry. Picks model + pass
strategy per tier label, runs the LLM call(s), records every pass to
lcm_synthesis_audit, returns final synthesized text.

Per-tier strategies (per architecture-v4.1 §3 + literature consensus
that critique-revise underperforms single-pass for summarization):

  daily      → single-pass     (mini model)
  weekly     → single-pass     (mid model)
  monthly    → single + verify_fidelity (premium model)
                — verify_fidelity prompt asks "are there claims in the
                  summary that aren't in the source?" — separate model
                  call, returns 'OK' or 'HALLUCINATION: <details>'
  yearly     → best-of-N (N=3) + judge (premium-thinking)
                — N candidates run in parallel; judge prompt picks
                  the best by index (0..N-1)
  custom     → single-pass     (mid model)
  filtered   → single-pass     (mid model)

Default models: claude-haiku-4-5 (daily), claude-sonnet-4-5 (weekly,
custom, filtered), claude-opus-4-7 (monthly), claude-opus-4-7-thinking
(yearly). Override per-prompt via lcm_prompt_registry.model_recommendation
or per-call via SynthesizeRequest.{modelOverride, forceModel}.

API:
  - dispatchSynthesis(db, llmCall, req: SynthesizeRequest)
    → Promise<SynthesizeResult>
  - LlmCall is INJECTED — production wires to existing pi-ai
    infrastructure (Group F integration); tests inject deterministic
    mocks. Keeps dispatch decoupled from the existing summarize.ts
    (which is geared to per-leaf compaction in the gateway hot path
    — different concerns).

SynthesizeRequest covers: tier, memoryType, sourceText, target
(summary_id OR cache_id), passSessionId (groups multi-pass audit
rows), bestOfN override (yearly), model overrides.

SynthesizeResult: output, primaryPromptId, audit IDs, total latency,
total cost cents, hallucinationFlagged (monthly), bestOfN detail
(yearly: n + selectedIndex + all candidates).

Audit trail: every pass writes a 'started' row up-front (forensic
record even if LLM crashes mid-call), then UPDATEs to 'completed'
or 'failed' with output + latency + cost + last_error.

Error handling:
  - missing_prompt: thrown if the (memoryType, tier, single|judge)
    triple has no active prompt registered. Operator must register
    via /lcm command (Group F) or seed in deployment.
  - llm_failure: re-thrown after writing audit row with status='failed'
    and last_error set. Caller (synthesis worker) decides whether to
    retry or surface to operator.
  - judge_failure: yearly tier judge returned malformed output
    (no digit, or out-of-range). Indicates a bad judge prompt — the
    candidate outputs are intact in audit rows for manual recovery.

Template rendering: simple {{source_text}}, {{tier}}, {{memory_type}}
substitutions for the primary template; {{candidate_summary}} for
verify; {{candidates}} (rendered as numbered list) for judge.

Coverage: 16 tests
  - DEFAULT_MODEL_BY_TIER + PASS_STRATEGY_BY_TIER constants
  - daily / weekly: single-pass, audit row, default model
  - monthly: single + verify; hallucinationFlagged true vs false vs
    skipped (no verify prompt)
  - yearly: 3 candidates + judge picks 1; bestOfN=5 override; judge
    output without digit → judge_failure; missing judge prompt →
    missing_prompt
  - missing primary prompt → missing_prompt
  - LLM call exception → llm_failure + audit row.status='failed' +
    last_error captured
  - prompt model_recommendation overrides tier default
  - forceModel + modelOverride wins
  - template substitution

Tests: 1130 → 1146 (+16; subagent's C.05 already merged).

Resolves: foundation for v4.1 §3 synthesis. Next (D.03): eval harness
for measuring retrieval recall + synthesis quality on Eva's stratified
N=100 query corpus.
Heuristic gate before procedure clustering. Most leaves are
conversational; only a small fraction look like procedures. We
pre-filter by the SHAPE of the content (not by FTS verb regex, which
3 adversarial agents flagged as too noisy + many false negatives).

Three structural signals (compose with OR):

  numbered-steps  — 3+ lines starting with "1.", "Step 1:", "1)",
                    "(1)", etc. Strict counting (no "1. ... only 2 ...")
                    Score weight: 0.4

  command-block   — 2+ shell-command-shaped lines:
                    - $-prompt, ❯-prompt, %-prompt, > -prompt
                    - lines inside ```bash/sh/zsh/shell``` fences
                    - lines starting with recognized tools
                      (git/npm/pnpm/yarn/docker/kubectl/terraform/aws/
                      gcloud/az/gh/cargo/python/node/psql/mysql/redis-cli)
                    Score weight: 0.4

  how-to-marker   — 2+ unambiguous markers like "how to ", "the procedure
                    for ", "steps to ", "in order to ", "first/then/finally,".
                    Conservative — single marker is too noisy (lots of
                    conversational uses).
                    Score weight: 0.3

A leaf is a clustering CANDIDATE if any one signal fires. The score
(sum of fired weights, capped at 1) is exposed for downstream
ranking — Group E's clustering call may threshold on it.

API:
  - prefilterContent(content) → {isCandidate, signals[], score}
  - prefilterLeaves<T>(leaves[]) → only the candidate rows, with
    {signals, score} attached

Pure module: no DB, no LLM, no async. Safe to call inline.

Coverage: 18 tests
  - numbered-steps: markdown, "Step N:", "N)", insufficient count, prose
    with embedded numbers
  - command-block: $ prompt, fenced bash, line-start tool names,
    single-command rejection
  - how-to-marker: 2+ markers fire, single marker doesn't
  - composite: multi-signal stack, score cap at 1, plain conversation
  - input edges: empty, undefined, null
  - prefilterLeaves batch helper

Tests: 1146 → 1164 (+18).

Resolves: foundation for v4.1 §6.2 procedure clustering. Next (E.02):
clustering pass that runs ml-hclust over candidate leaves' embeddings.
Eva added 23 commits May 6, 2026 02:54
Operator-only purge service. Two modes:

  mode='soft' (default):
    - Sets suppressed_at + suppress_reason on matched leaves
    - Flags affected condensed (contains_suppressed_leaves=1)
    - Cascade triggers (B.03) handle vec0 + meta cleanup
    - All retrieval surfaces (search, semantic, hybrid) auto-filter
      these out via the v4.1 §10 invariant + C.03 + C.fix
    - Reversible (operator can clear suppressed_at to restore)

  mode='immediate':
    - Same as soft mode for the leaves
    - PLUS: enqueues affected condensed summaries to
      lcm_purge_rebuild_queue (introduced in A.03 / v4.1.1 B2)
    - Worker (Group F follow-up) drains the queue, rebuilds each
      condensed WITHOUT suppressed leaves' content (per v4.1.1 A4
      forwarder pattern: write new condensed, mark old superseded_by,
      never mutate summary_parents), THEN can finally hard-DELETE
      the leaves (no more parent_summary_id refs blocking)

ARCHITECTURE NOTE on the two-step immediate flow: SQLite schema has
summary_parents.parent_summary_id with ON DELETE RESTRICT. We CANNOT
direct-DELETE a leaf that has un-rebuilt condensed parents. Two-step
gives operator immediate feedback ("5 leaves purged, 3 condensed
queued for rebuild") rather than rolling back when ANY leaf has a
condensed parent. Worker handles the deferred hard-delete after
rebuild completes.

API:
  - runPurge(db, opts: PurgeOptions) → PurgeResult

PurgeOptions:
  Criteria (one of):
    - summaryIds: explicit list
    - sessionKey + (since? + before?) + minTokenCount?: range
  reason (REQUIRED, free-text)
  mode? ('soft'|'immediate', default soft)
  allowMainSession? (override the agent:main:main safety check)

PurgeResult: affectedLeafIds[], rebuildQueueIds[] (immediate only),
purgeSessionId, mode.

Validation throws PurgeError(kind):
  - missing_reason: empty reason
  - no_criteria: zero filters set
  - main_session_blocked: sessionKey='agent:main:main' without
    allowMainSession=true (operator must be EXPLICIT — Eva's primary
    thread is too important to purge by accident)

Already-suppressed + non-existent IDs filtered at SELECT level —
operator gets back the actual affected list, not the requested list.

NOT in this commit: the worker that drains lcm_purge_rebuild_queue +
does the actual hard-delete after rebuild. That's Group F.03 (worker
scheduler integration) or a separate follow-up commit.

CRITICAL: this module is OPERATOR-ONLY. Caller MUST gate via
deps.isOperatorSession() or equivalent — there's nothing in the module
itself that prevents an agent from invoking it. Plugin wiring (Group F
tool registration) is where the gating happens.

Coverage: 13 tests
  - missing reason rejected (PurgeError)
  - no criteria rejected
  - agent:main:main refused without override; allowed with override
  - soft mode: suppressed_at + suppress_reason set; other-session leaves
    untouched
  - soft mode: contains_suppressed_leaves flag on condensed parents
  - immediate mode: leaves still exist (RESTRICT FK) but suppressed_at
    set + queue entries created
  - immediate w/o affected condensed: empty rebuild queue
  - range purge by sessionKey + token cutoff
  - range purge with since/before
  - explicit summaryIds: filters out invalid + already-suppressed
  - empty match returns empty result
  - atomic transaction (suppressed_at + condensed flag set together)

Tests: 1237 → 1250 (+13).

Resolves: foundation for v4.1 §10 operator hard-forget. Group F.02-F.05
build the operator-facing /lcm command surface that calls this.
Coordinator service that wraps the various worker job entry points
behind a common API. Used by /lcm worker tick <kind> (CLI wiring in
F.03b after subagent finishes lcm-command.ts) and by the WorkerLoop
when persistent worker scheduling is wired into plugin lifecycle.

API:

  - getWorkerStatusSnapshot(db, {modelName?}) → WorkerStatusSnapshot
    Returns lockInfo for each WorkerJobKind + pending counts (extraction
    queue, embedding backfill if model specified). Used by /lcm worker
    status and /lcm health.

  - tickEmbeddingBackfill(db, args) — wraps runBackfillTick (B.04) with
    auto-generated worker_id

  - tickExtraction(db, args) — acquires worker lock + runs
    runCoreferenceTick (E.03), releases on finally. Returns
    {...result, lockAcquired: boolean} so caller can distinguish
    "ran but extracted nothing" from "didn't run because lock held".
    Wraps with explicit lock because runCoreferenceTick doesn't
    acquire its own (unlike backfill).

  - tickProcedureMining(db, args) — acquires extraction lock (shared
    with entity coref since both walk the same queue conceptually) +
    runs mineProceduresPass (E.02)

  - forceReleaseLock(db, jobKind) — operator escape hatch when a worker
    crashed without releasing AND TTL hasn't expired yet. USE WITH
    CAUTION — race window if original holder is still alive. Documented.

  - heartbeatAllHeldLocks(db, workerIdsByKind) — for the future
    WorkerLoop integration to refresh held locks. Silent no-op for
    locks not in the supplied map.

Design choice: thin coordinator, thick injectables. Each tick* function
takes the LLM/extractor injectable for the underlying job. Makes
testing trivial (mocked injectables, no live API). Production wiring
in plugin/index.ts will provide pi-ai-backed extractors.

Coverage: 10 tests
  - status snapshot: empty state, locks reflected, extraction count
  - tickExtraction happy path: lock acquired, ran, released
  - tickExtraction with held lock → lockAcquired=false, no work
  - tickExtraction extractor throws → lock STILL released (try/finally)
  - tickProcedureMining lock-protection
  - forceReleaseLock returns true once, false on second call
  - heartbeatAllHeldLocks: refreshes only matching worker_id

Tests: 1250 → 1260 (+10).

Resolves: foundation for v4.1 §0 worker orchestration. F.03b (separate
commit, after subagent merges) wires this into /lcm worker tick / status
in lcm-command.ts.
Adds an operator-facing v4.1 health snapshot accessible via the new
`/lcm health` subcommand. The snapshot is read-only and tolerant of
missing subsystems (no profile registered, vec0 not loaded, no eval
runs yet) so it returns a meaningful payload on any DB shape.

Service helper at src/operator/health.ts exposes a typed
V41HealthSnapshot covering:

- Embeddings: active model + dim, vec0 version, pending backfill,
  embedded-row count
- Workers: per-job-kind status (idle vs active, with EXPIRED flag for
  crashed workers whose lock outlived its TTL)
- Synthesis: active prompt count, distinct memory_types, recent
  synthesis runs (7d window)
- Eval: query set count, most-recent run (mode + recall score), drift
  index from latest lcm_eval_drift row
- Suppression: suppressed leaf count, pending purge rebuilds

The /lcm command formats the snapshot as a markdown report. /lcm help
text gains an entry for the new subcommand. The existing /lcm
subcommand parser pattern (single command in manifest, internal
dispatch) means no openclaw.plugin.json change is needed.

Tests cover all five sections individually plus the overall snapshot
shape, including edge cases like vec0-not-loaded, no-profile, expired
worker locks, and empty drift history.
Optional Group G — themes are AGENT-EXPLICIT only, NEVER in the
assemble() pyramid (per RAG-leak adversarial finding in v4 review).
This commit lands:

  1. Schema: lcm_themes + lcm_theme_sources + suppression cascade trigger
  2. Service: consolidateThemesPass (idle pass) + listThemes + manual
     markThemesStaleFor

Schema:
  lcm_themes (theme_id PK, session_key, name, description,
              source_leaf_count, consolidated_at, status, model, pass_id)
    - status: 'active' / 'stale' / 'archived'
    - lookup index on (session_key, status, consolidated_at DESC)
  lcm_theme_sources (theme_id FK CASCADE, summary_id FK CASCADE)
    - normalized many-to-many; CASCADE both directions
    - index by summary_id for the suppression-cascade trigger
  Trigger lcm_themes_stale_on_suppress:
    AFTER UPDATE OF suppressed_at WHEN transitioning to NOT NULL
    → flip themes referencing the leaf from 'active' to 'stale'

Service: consolidateThemesPass(db, candidates, nameTheme, opts)
  Pipeline:
    1. Dedupe candidates by summaryId
    2. Cluster via E.spike's ml-hclust wrapper (Ward + cosine)
    3. For each cluster ≥ minOccurrences (default 5; lower than
       procedure-mining's 8 because themes tolerate smaller clusters):
         - Call INJECTED nameTheme(cluster) → {name, description, confidence?}
         - If confidence >= minConfidence (default 0.6) AND name nonempty:
           write lcm_themes row + lcm_theme_sources rows in one tx
  - Per-cluster naming-pass failure → record but continue with other clusters
  - Returns ConsolidateThemesReport: candidateCount, clusterCount,
    largeClusterCount, themesWritten, namingRejected, themes[]

NameThemeFn injection: production wires to pi-ai (caller's concern);
tests inject deterministic mock.

Status semantics:
  - 'active' — agent-queryable (lcm_recent_themes, lcm_theme_explain
    when those tools land — deferred to a follow-up)
  - 'stale' — source leaves changed; needs re-consolidation
  - 'archived' — operator-marked; not visible to agents

Suppression cascade has TWO layers:
  - Hard-delete (purge --immediate path): FK CASCADE on lcm_theme_sources
    drops the source rows; theme keeps row (source_leaf_count goes stale)
  - Soft-suppress: AFTER UPDATE trigger flips status='active' → 'stale'
    so re-consolidation picks them up next pass

NOT in this commit:
  - lcm_recent_themes / lcm_theme_explain / lcm_search_themes agent
    tools — deferred to a follow-up commit (G.02 if Group G grows)
  - 95% embedding-coverage gate — caller (worker scheduler) decides
    when to run consolidation; not enforced in the service
  - Idle-pass cadence — caller decides

Coverage: 12 tests
  - schema: tables + trigger present after migration
  - basic happy path: 6 leaves → 1 theme + 6 source rows
  - clusters below minOccurrences skipped silently
  - operator can lower minOccurrences for testing
  - low confidence rejected
  - namer throws → naming-rejected, other clusters still processed
  - empty name rejected
  - suppression cascade: UPDATE suppressed_at → status='stale'
  - trigger does NOT fire on no-op (NULL → NULL)
  - markThemesStaleFor manual flip
  - listThemes status filter
  - empty input

Live-DB verified: migration runs in 4.5s on Eva's lcm.db; tables +
trigger created cleanly.

Tests: 1260 → 1272 (+12; subagent added 16 more in parallel = 1288).

Resolves: foundation for v4.1 §6.3 themes. Optional per the plan;
landing it now keeps Group G off the future-work list.
…es (F.04)

Adds the operator-facing reconcile path for merging legacy session
keys. The use case: pre-v4.1 conversations may have had NULL
session_keys backfilled by A.09 to `legacy:conv_<id>`; an operator
wants to merge several legacy threads into a single logical session
so retrieval treats them as one history.

Service helper at src/operator/reconcile-session-keys.ts exposes
reconcileSessionKeys(db, args) and listLegacyCandidates(db).
Behavior:

- UPDATEs conversations.session_key + summaries.session_key for every
  conversation matching the `from` keys to the `to` key.
- INSERTs ONE audit row per conversation moved into
  lcm_session_key_audit (the table's conversation_id is NOT NULL, so
  bulk-per-source-key audit rows aren't possible — and the
  per-conversation grain matches the existing `/lcm
  undo-session-key-rekey <conv>` reverse path).
- Refuses if `to === 'agent:main:main'` without --allow-main-session,
  if from list is empty, or if reason is empty.
- Idempotent: re-running with the same args after the migration is
  done returns zeros (no rows match the from keys anymore).

The /lcm command surface gains:
  /lcm reconcile-session-keys --list-candidates
  /lcm reconcile-session-keys --apply --from k1,k2 --to k3 --reason "..."
       [--allow-main-session]

A new splitArgsQuoted() helper lets `--reason "with spaces"` survive
tokenization. Help text + parser entries are added; no
openclaw.plugin.json change needed (single /lcm entry, internal
dispatch).

Tests cover input validation, basic single + multi-source merge, audit
row creation with original_session_key preserved, idempotent re-run,
orphan-summary cleanup, custom appliedBy, and the realistic
constraint that the conversations_active_session_key_idx UNIQUE index
makes merging multiple ACTIVE convs into the same key fail loudly.
Resolves Group E adversarial-pass findings #1 (BLOCKER), #2-#5 (HIGH).

## Gap 1 (BLOCKER) — numClusters override crashed on degenerate trees

src/extraction/hierarchical-cluster.ts: when ml-hclust's `tree.group(K)`
returned fewer leaves than expected (degenerate dendrogram from
identical/near-identical vectors), the wrapper threw "internal error:
vector index N was not assigned to any cluster" — false-positive crash.

Fix: missing leaves now get assigned to NEW singleton clusters
(nextFallbackId++). Caller's `numClusters` is documented as best-effort.
Updated docstring to match. NEVER crashes on degenerate input.

Latent in current code paths (procedure-mining doesn't pass numClusters)
but blocked any future caller — including planned operator overrides.

## Gap 2 (HIGH) — undefined judge confidence crashed mid-pass

src/extraction/procedure-mining.ts:217. judgement.confidence undefined
slipped into the `>= minConf` check (false), routed to draft path,
then SQLite bind threw `TypeError: Provided value cannot be bound`
mid-loop, killing the rest of the mining pass.

Fix: validate confidence is finite + in [0,1] BEFORE threshold check.
Bad values route to judgeRejected with skipReason='judge-bad-confidence:
got <value>'. Mining continues with next cluster.

Real LLM JSON parsers occasionally drop fields under load — this is
the safer fail-mode (vs coercing to 0 which would silently mark as
draft).

## Gap 3 (HIGH) — mention idempotency claim was a lie

src/extraction/entity-coreference.ts:217. Module docstring + inline
comment both claimed "deterministic mention_id ... INSERT OR IGNORE
so re-runs don't duplicate mentions". The actual ID included
`randomSuffix()` — making the PK non-deterministic, so INSERT OR IGNORE
NEVER fired and re-runs created duplicate mentions.

Fix: dropped randomSuffix from mentionId. Format is now
`men_${entityId}_${leaf_id}_${truncateForId(surface, 16)}`.
Re-running the extractor on the same leaf with the same surface in
same entity = SAME mention_id = INSERT OR IGNORE no-ops. As intended.

Bumped truncateForId max from 8 to 16 chars to reduce collision risk
between different surfaces in the same leaf.

## Gap 4 (HIGH) — DEFAULT_MIN_OCCURRENCES=8 contradicted schema-tuned 4

src/extraction/procedure-mining.ts:110. Schema comment in
src/db/migration.ts:1721 (B7/B8 amendment) says "empirically-tuned
promotion threshold (4 occurrences per B8, was 8 in v4.1)". The
mining default was still 8 — Eva's small-corpus regime would never
auto-promote procedures with the wrong default.

Fix: DEFAULT_MIN_OCCURRENCES = 4. Aligned with schema tuning.
Per-call override still works for operators who want a higher bar.

## Gap 5 (HIGH) — prefilter false positives on conversational text

src/extraction/procedure-prefilter.ts. The numbered-steps heuristic
accepted "non-decreasing" numbering, which trips on:
  - numbered citations: [1] Smith ... [2] Jones ... [3] Wang
  - action items: 1. Bob ... 2. Alice ... 3. Carol
  - random conversation with embedded numbers

Fix: now requires STRICTLY-SEQUENTIAL numbering (n+1 after n) AND that
runs start at 0/1/2 (tolerance for "0. setup" prefixes). A break in
sequence resets the run counter.

This drops the prefilter false-positive rate substantially — important
because every false-positive becomes a wasted Voyage rerank token in
the downstream clustering + judge pipeline.

(Other prefilter signals — command-block, how-to-marker — left as-is;
their false-positive rates are acceptable given how much rarer the
trip conditions are. Operators can monitor via /lcm health.)

## Coverage updates

Existing test "clusters below minOccurrences get skipReason" updated
from 5 leaves (was below 8 default) to 3 leaves (below new 4 default).
Test count delta: 0 (test count unchanged).

## Tests

1288 → 1302 (+14; F.04 + F.05 subagent work also landed in parallel).

## Deferred (cycle-2 polish from same review)

#6 MED suppression race in entity-coreference (re-check inside tx)
#7 MED defense-in-depth re-prefilter wastes work
#8 LOW prefilter score field is dead
#9 LOW procedure-recheck queue kind has no producer/consumer
#10 LOW procedure_id slice(0,30) makes long session_keys indistinguishable
…+ tier_label normalization (D.fix)

Resolves Group D adversarial-pass HIGH/MED gaps #1, #2, #3, #4.

## Gap 1 (HIGH) — dispatch dry-run contract was a lie

src/synthesis/dispatch.ts. Docstring claimed "For a synthesis pass that
doesn't yet have a target (e.g., dry-run), pass neither — the audit
row will be skipped." But runPassWithAudit always called insertAuditRow,
which forwarded both as null and the schema CHECK
(target_summary_id IS NOT NULL OR target_cache_id IS NOT NULL) fired
mid-pass with a raw SQLite error.

Fix: validate target up-front. If both targetSummaryId AND targetCacheId
are missing, throw SynthesisDispatchError("missing_target") BEFORE
touching the LLM. Updated docstring to match the (now-correct) contract.

Caller experience: clear typed error vs confusing CHECK violation
midway through best-of-N or verify pass.

## Gap 2 (HIGH) — best-of-N pass_session_id splattered across N+1 sessions

src/synthesis/dispatch.ts:499, 541. Best-of-N candidate calls + judge
call each had a unique pass_session_id (`${id}_cand0`, `${id}_cand1`,
`${id}_cand2`, `${id}_judge`), splitting one logical attempt into 4
distinct audit-table sessions.

Schema docstring explicitly says: "pass_session_id groups all passes
of one logical synthesis attempt (helps debug best-of-N runs + GC
orphaned partial sessions)." Operators querying
WHERE pass_session_id = X would see zero rows for what they thought
was a single attempt.

Fix: ALL passes in a best-of-N attempt share the same pass_session_id
(req.passSessionId, unmodified). Per-pass disambiguation via
pass_kind, pass_input_truncated, and ran_at timestamps.

This unblocks the orphan-GC index `lcm_synthesis_audit_started_gc_idx`
to actually GC correctly-scoped sessions.

## Gap 3 (MED) — empty-string vs NULL tier_label collision

src/synthesis/prompt-registry.ts. The B.fix Gap 2 UNIQUE INDEX uses
COALESCE(tier_label, '') treating NULL and '' as equivalent at the DB
level. But getActivePrompt + registerPrompt used literal `IS NULL` vs
`= ?`, so:
  - register({tierLabel: ""}) succeeded
  - getActivePrompt({tierLabel: null}) saw no row
  - register({tierLabel: null}) tried to add — UNIQUE index conflict

Operators (e.g., Group F's /lcm UI) hitting this would see confusing
SQL errors instead of "prompt already exists for this triple."

Fix: normalize tierLabel === "" → null in both getActivePrompt and
registerPrompt. API surface now matches UNIQUE index semantics.

## Gap 4 (MED) — audit INSERT failure left no forensic record

src/synthesis/dispatch.ts:runPassWithAudit. The "started" insertAuditRow
call could fail (FK violation on bad target_summary_id, CHECK
violation, etc.) and the raw SQLite error propagated unwrapped — no
forensic trace, no typed error.

Fix: wrap insertAuditRow in try/catch, throw
SynthesisDispatchError("audit_insert_failure") so callers can
distinguish setup errors from LLM failures. Caller knows the LLM
was NEVER called.

## SynthesisDispatchError kinds expanded

Added `missing_target` and `audit_insert_failure` to the discriminated
union. Existing `missing_prompt`, `llm_failure`, `judge_failure`
unchanged.

## Tests

1302 passing (no test count delta — existing tests for missing_prompt /
llm_failure semantics still hold; audit-related test would be added
in cycle-2 polish).

## Deferred

- Gap 5 (MED): judge prompt 0-indexed contract (rendering as
  ### Candidate ${i+1} would help, but tests pass at 0-indexed today)
- Gap 6 (MED): JSON envelope v=1 not validated in eval/run.ts
- Gap 7 (LOW): forceModel + undefined modelOverride silent fallthrough
- Gap 8 (LOW): decodeQuerySetId accepts ambiguous version strings
- Gap 9 (LOW): selectPriorRun tiebreak race within same second
- Gap 10 (LOW): cross-file integration test gap

These are polish items deferred to cycle 2.
…face (F.05)

Wires the D.03 eval harness (recall + run recording + drift) into the
`/lcm eval` operator subcommand. The retrieval adapter is INJECTED so
the service is testable without vec0 or Voyage credentials —
production wires the real adapter (FTS-only or hybrid) at the call
site.

Service helper at src/operator/eval-runner.ts exposes runEval(db, args)
which:

- Loads the registered query set (throws EvalRunnerError on missing).
- Runs recall@K via the injected adapter against every query.
- Records the run via D.03's recordEvalRun; computes drift vs the
  prior run of the same (query_set, mode) and returns null instead of
  a zeroed summary when no baseline exists yet.

The /lcm command surface gains:
  /lcm eval --baseline                    (eva-baseline v1, fts_only)
  /lcm eval --mode hybrid --query-set <name> --version <n>

Production adapters:
  fts_only  → wraps summaryStore.searchSummaries (mode='full_text')
  hybrid    → wraps runHybridSearch with rerank=false (RRF only;
              gracefully degrades to FTS when vec0 is missing); the
              report surfaces a vec0-missing warning when applicable

Output: markdown summary with overall + per-stratum recall@K + MRR,
plus drift vs prior run.

NOT in this commit (per spec):
  - Synthesis-quality (judge) eval
  - 5x noise-floor calibration (operator workflow concern)
  - --register-set --queries-file CLI flag (defer; operator seeds via
    SQL or registerQuerySet today)

Tests cover: missing query set, basic recall flow, run row record,
fresh-baseline drift=null, drift comparison vs prior run, mode
isolation (different mode → fresh baseline), per-stratum aggregation
respecting the "no expectedSummaryIds → skipped" recall rule, plus
formatter coverage for both no-prior and with-prior cases.
Exercises Groups A → G in a single test: setup → write leaves with
embeddings → semantic search → hybrid search with mock rerank →
synthesis dispatch (daily tier) → entity coreference extraction →
procedure mining → themes consolidation → operator purge →
backfill cron → suppression-cascade trigger.

All LLM calls mocked (deterministic returns); all Voyage HTTP calls
mocked (no live API). Validates that the components compose correctly
end-to-end — catches integration bugs that per-module tests miss.

Plus a Gap-5 prefilter validation case: rejects "Just chatting" prose,
accepts numbered+command procedure-shaped text, rejects non-sequential
numbered citations ([1] [3] [5]).

vec0-gated (LCM_TEST_VEC0_PATH env). Runs in ~17ms.

Tests: 1302 → 1313 (+11; pipeline smoke + prefilter regression
+ subagent's F.04/F.05 work landing in parallel).

This is the "v4.1 components actually work together" gate before
opening the omnibus PR.
…inal.fix)

Resolves Final whole-PR adversarial-pass findings #1 (BLOCKER), #2-#5 (HIGH).

## #1 (BLOCKER) — Suppression bypass via lcm_describe + assembler hot path

src/store/summary-store.ts: getSummary, getSummaryParents, getSummaryChildren,
getSummarySubtree did NOT filter `suppressed_at`. Agents calling
lcm_describe on a suppressed summary got full content back.
src/assembler.resolveSummaryItem reads via getSummary, so context_items
rows pointing at suppressed summaries could re-emit purged content
into every turn's assembled context — directly contradicting the v4.1
§10 keystone "operator opt-out lives in operator-only tools, never
agent-facing."

Fix:
- getSummary / getSummaryParents / getSummaryChildren: added
  `includeSuppressed?: boolean` parameter (default false). Internal
  cleanup callers (integrity.ts, compaction.ts) opt in via
  `{includeSuppressed: true}` when they legitimately need to inspect
  suppressed rows. Agent surfaces use the safe default.
- getSummarySubtree: added unconditional `WHERE s.suppressed_at IS NULL`
  to the recursive CTE's outer JOIN — caller never needs the
  with-suppressed view here (subtree is always agent-facing).
- src/operator/purge.ts (BOTH soft + immediate modes): cleans up
  context_items rows referencing the purged summaries. Even with the
  store-layer filter, a stale context_items row is misleading state;
  removing them at purge time is the cleanest cut.

## #2 (HIGH) — Agent tools could hang gateway on Voyage error

src/tools/lcm-semantic-recall-tool.ts + src/tools/lcm-grep-tool.ts.
Neither tool passed voyageMaxRetries / voyageTimeoutMs to the
underlying service, so they fell through to Voyage client defaults
(3 retries × 60s timeout = up to ~244s worst case). Backfill cron
correctly capped (B.fix2 Gap 1) but agent tools were missed.

Fix: extended SemanticSearchOptions + HybridSearchOptions to accept
`voyageTimeoutMs`. Both agent tools now pass `voyageMaxRetries: 1,
voyageTimeoutMs: 15_000` — worst case ~30s per call, fits the gateway
hot path budget. Operators can still use the default longer budget
when calling the services directly (e.g. backfill, eval).

## #4 (HIGH) — /lcm eval baseline cold-start error pointed at non-existent flag

src/operator/eval-runner.ts. When the eva-baseline-v1 query set was
not yet registered (cold start), the error told the operator to run
`/lcm reconcile-session-keys --register-set` — that flag doesn't
exist on either subcommand.

Fix: rewrote the error message to point at the actual workaround
(`registerQuerySet()` service call from a Node REPL, or direct SQL
INSERT into lcm_eval_query_set + lcm_eval_query). CLI-side seed flag
deferred to cycle-2 with explicit acknowledgement in the message.

## #5 (HIGH) — /lcm reconcile-session-keys raw SQLite UNIQUE error

src/operator/reconcile-session-keys.ts. The
`conversations_active_session_key_idx` partial UNIQUE index over
(session_key) WHERE active=1 fired with a raw SQLite error if the
operator tried to merge multiple ACTIVE conversations into one key —
no guidance on the workaround.

Fix: pre-check at the top of reconcileSessionKeys. Counts active
conversations on both `from` keys and the `to` key; if the merge
would exceed 1 active per session_key, throws typed
ReconcileError("active_conflict") with a workaround in the message
(archive all but one via UPDATE conversations SET active=0,
archived_at=datetime('now')). Updated test from generic /UNIQUE/
regex to assert on the typed error + workaround text.

## #3 (HIGH, partial) — worker orchestrator + extraction etc. unwired

The reviewer flagged that `runPurge`, `tickEmbeddingBackfill`,
`mineProceduresPass`, `runCoreferenceTick`, `consolidateThemesPass`
are all infrastructure-only — not invoked from any production code
path in the plugin. esbuild tree-shakes them out of the bundle.

Partial fix this PR: added `/lcm worker [status]` subcommand that
surfaces lockInfo + pending counts from the worker-orchestrator
service. Manual `/lcm worker tick <kind>` is documented as deferred
to cycle-2 (LLM-call injection through plugin lifecycle is a
substantial wire-up that didn't fit this PR).

Updated PR description (separate doc) makes the cycle-2 wiring
explicit, so operators reading the PR aren't surprised that
infrastructure-only services can't yet be operator-invoked.

## Coverage

New test/v41-finalreview-suppression.test.ts (9 tests) validates Finding
#1 fix end-to-end:
- getSummary returns null for suppressed (was BLOCKER: returned full content)
- getSummary with includeSuppressed=true still returns suppressed
- getSummaryChildren / Parents exclude suppressed by default + include
  with opt-in
- getSummarySubtree omits suppressed nodes from recursive CTE
- runPurge cleans up context_items in both soft + immediate modes
- runPurge does NOT touch context_items for non-targeted summaries

Updated test/operator-reconcile-session-keys.test.ts to assert the
typed ReconcileError("active_conflict") with workaround text in
message (was a generic /UNIQUE/ regex).

Tests: 1313 → 1322 (+9 from new regression suite).
Build: dist/index.js = 782.4kb (was 772.5kb; +10kb for new commands).

## Deferred (cycle-2)

- #6 MED: /lcm eval --mode semantic_only stores wrong mode in audit
- #7 MED: eval hybrid adapter swallows Voyage auth errors
- #8 MED: tickProcedureMining shares "extraction" lock — needs
  `procedure-mining` job kind in WORKER_JOB_KINDS
- #9 MED: command-level test coverage for /lcm health/worker/eval/
  reconcile (currently only services tested)
- #10 LOW: ml-hclust runtime dep but tree-shaken (drop or wire)
- #11 LOW: lcm_voyage_rate_state table unused
- #12 LOW: lcm_describe doesn't surface suppress_reason field
- Manual `/lcm worker tick <kind>` (LLM-call wiring through plugin)
… (Wire.1+2)

Closes Final-review Finding #3 (HIGH): "worker orchestrator + extraction
queue + procedure mining + themes + backfill are all infrastructure-only,
unwired into the production plugin surface". This commit lands the two
most-load-bearing pieces of wiring so v4.1 retrieval works end-to-end:

## 1. Leaf-write hook → lcm_extraction_queue

src/store/summary-store.ts:insertSummary now enqueues an entity-extraction
row for every leaf written. Best-effort (try/catch — leaf-write must
succeed even if queue insert fails). MUST run BEFORE the FTS-availability
early-return so FTS-disabled installs (or in-memory test DBs) still
participate.

This was the missing link: without it, lcm_extraction_queue stayed
empty regardless of how many leaves the gateway wrote, so the entity
coreference worker would have nothing to drain in production.

NEVER inline LLM call (per the v3.1 invariant — 3-agent-convergent
finding). Just inserts a row; worker drains async.

## 2. `/lcm worker tick embedding-backfill` operator command

src/plugin/lcm-command.ts. Wraps the worker-orchestrator's
tickEmbeddingBackfill in a subcommand that:
  - Pre-flight checks: VOYAGE_API_KEY present, vec0 loaded, active
    embedding profile registered. Each failure prints a clear
    actionable error.
  - Pre-tick stats: pending count + active model name
  - Runs ONE tick (perTickLimit=200, ~7-15 min at 0.5 RPS)
  - Post-tick: embedded count, skipped, Voyage tokens, duration
  - Hint operator to re-invoke if pending > 0

This is the operator's path to actually USE v4.1 retrieval today. Without
it, lcm_semantic_recall + lcm_grep --mode hybrid would always degrade
to FTS-only (no embeddings exist).

Other tick kinds (extraction, procedure-mining, themes-consolidation)
require LLM-call injection wiring through the plugin lifecycle — flagged
in the operator-error message as cycle-2.

## What this PR now actually delivers (vs pre-Wire commits)

Pre-Wire: schema landed + agent tools registered, but vec0 stayed empty
(no backfill ever invoked) + entity coref had nothing to drain. Most of
the +21K LOC was infrastructure-only dead code.

Post-Wire:
  - Operator runs `/lcm worker tick embedding-backfill` to populate vec0
  - Existing `lcm_semantic_recall` + `lcm_grep --mode hybrid` start
    returning real results (the +52.5pp paraphrastic lift from Phase A
    spike actually applies)
  - Future leaf writes enqueue for entity coref (worker tick path
    deferred to cycle-2)

Coverage: 3 new tests in test/v41-wiring.test.ts:
  - inserting a leaf enqueues an entity-extraction row
  - condensed summaries do NOT enqueue (leaf only)
  - queue insert failure (e.g. table missing) does NOT fail leaf-write

Live-DB verified: copied Eva's lcm.db, ran migration, inserted a test
leaf via SummaryStore — queue row appears as expected.

Tests: 1322 → 1325 (+3).
Build: dist/index.js = 794.6kb (was 782.4kb; +12kb for the new
operator command).

## Still deferred to cycle-2 (now with smaller scope)

- Worker-loop autostart on plugin init (so backfill runs without
  manual /lcm worker tick)
- Auto-tick `extraction` when leaves enqueue (needs LLM-injection path)
- procedure-mining + themes-consolidation auto-ticks
- Worker_threads heartbeat isolation (v4.1.1 A9)

These are discrete commits, each ≤200 LOC, that build on the wiring
this PR adds. Operator can validate v4.1 today by running the manual
tick command.
…Wire.3)

Closes the wiring gap. Embedding backfill now runs automatically once
the plugin loads (gated on VOYAGE_API_KEY presence). Operator no
longer needs to manually invoke /lcm worker tick — it just happens.

## src/operator/backfill-autostart.ts (NEW)

tryStartBackfillAutostart(db, {log, env?, intervalMs?, tickFn?}):
  Pre-flight checks (each failure logged ONCE, returns NO_OP_HANDLE):
    - VOYAGE_API_KEY env var present
    - sqlite-vec extension loaded
    - active embedding profile registered

  If all pass: starts a setInterval loop that runs ONE backfill tick
  every {intervalMs} (default 5 min). Each tick processes up to
  perTickLimit=200 docs (~7-15 min at 0.5 RPS).

  Auto-stop conditions:
    - 3 consecutive idle ticks (countPendingDocs returns 0) → pause;
      future leaf writes will re-trigger the cycle
    - 3 consecutive Voyage failures → stop, log error, require manual
      restart

  Returns AutostartHandle with stop() / isRunning() / tickCount() —
  caller stores in shared state, calls stop() on gateway_stop.

## src/plugin/index.ts wire-up

After wirePluginHandlers, fire-and-forget shared.waitForDatabase().then(
  startAutostart). The autostart handle goes into shared.backfillAutostart
so the gateway_stop handler can clean it up.

## src/plugin/shared-init.ts

Added optional `backfillAutostart` field to SharedLcmInit so the
singleton-per-DB-path check carries the autostart handle across
per-agent-context register() calls.

## scripts/v41-live-db-harness.mjs (NEW)

End-to-end verification script. Copies ~/.openclaw/lcm.db to a test
path, runs migration, registers profile, runs ONE backfill tick (20
docs ≈ $0.05 cost), then validates:
  - All 22 v4.1 tables exist
  - lcm_semantic_recall returns hits for "rebase plan-mode openclaw"
  - lcm_grep --mode hybrid returns hits for "rebase"
  - Suppression cascade: suppress a leaf, verify removed from semantic
    results AND context_items cleaned up
  - Leaf-write hook: insert a leaf via SummaryStore, verify queue row
    appears
  - Entity coreference: drain queue with mocked extractor, verify
    entity row inserted

Usage:
  VOYAGE_API_KEY=$(cat ~/.openclaw/credentials/voyage-api-key) \
    npx tsx scripts/v41-live-db-harness.mjs

## Verification result (just ran against Eva's live DB)

ALL CHECKS PASSED:
  - Migration: 4.5s on 4187-leaf corpus
  - Backfill tick: 20 docs in 1.18s, 20040 Voyage tokens
  - Semantic recall: 10 hits returned for paraphrastic query
  - Hybrid grep: 5 hits returned for "rebase"
  - Suppression: leaf removed from semantic results post-purge,
    context_items cleaned
  - Leaf-write hook: queue row appears immediately
  - Entity coref: extractor invoked, entity row inserted

This is the strongest possible validation: real corpus, real Voyage
API, real retrieval results. Harness DB preserved at
/Volumes/LEXAR/lcm-tmp/lcm-harness-*.db for inspection.

## Coverage

5 new tests in test/v41-backfill-autostart.test.ts:
  - VOYAGE_API_KEY missing → NO_OP_HANDLE + log message
  - vec0 not loaded → NO_OP_HANDLE + log message
  - no active profile → NO_OP_HANDLE + log message
  - all pre-flight passes → running handle, stop is idempotent
  - stop() can be called multiple times (idempotent)

Tests: 1325 → 1330 (+5).
Build: dist/index.js = 798.2kb (was 794.6kb; +3.6kb for the
autostart module).

## What v4.1 actually delivers TODAY (post-Wire.3)

When Eva redeploys with VOYAGE_API_KEY set:
  1. Plugin boots, backfill autostart kicks in after 5s
  2. ~5 min later, first backfill tick processes 200 docs
  3. After ~1 hour, full corpus embedded (~4187 leaves, ~$1 cost)
  4. lcm_semantic_recall + lcm_grep --mode hybrid return real results
     (the +52.5pp paraphrastic lift from Phase A spike applies)
  5. New leaves auto-enqueue extraction (worker tick deferred to cycle-2)

Everything else (entity coref auto-tick, procedure mining, themes
consolidation, worker_threads heartbeat) remains cycle-2.
Both docs are also on the PR (description + comment) but committing
into docs/v4.1/ ensures they survive repo migrations / fork resyncs /
PR closures and are versionable alongside the code they describe.

- docs/v4.1/PR_DESCRIPTION.md — architecture, data flow, group
  commit map, adversarial review history, file structure, operator
  gates, cycle-2 follow-ups
- docs/v4.1/KNOWLEDGE_DUMP.md — architectural reasoning, load-bearing
  decisions, debugging playbook, "what I'd do differently", cycle-2
  ordering. Written while context was hot — last-mile knowledge
  preservation for future maintainers.
Lists consolidated themes for a session via the agent surface. Themes are
NEVER in the assemble() pyramid (per the v4 RAG-leak finding) — agents must
call this tool explicitly to surface them.

Wraps `listThemes()` from src/themes/consolidation.ts. Schema accepts
optional sessionKey / status (active|stale|archived|all, default active) /
limit (1-50, default 20).
…cle-2)

Wires async entity coref into plugin lifecycle. The extraction queue
(populated by leaf-write hook from Wire.1) now drains automatically.

## src/operator/worker-llm.ts (NEW)

createWorkerLlmCall(config: {deps, defaultModel?, timeoutMs?, ...}):
  → LlmCall

Wraps deps.complete (CompleteFn) into the LlmCall shape that
synthesis dispatch + worker tasks expect. Reuses model resolution +
auth from the existing summarizer's plumbing — no new credential
plumbing. Generic enough to support entity extraction, procedure
judging, theme naming, synthesis dispatch, and lcm_synthesize_around
(landing in parallel via subagent).

Defensive: extracts text from multiple response shapes; per-call
timeout (default 60s) so stuck LLM doesn't block worker loop heartbeat.
Cost intentionally undefined (no cost calculator wired).

## src/extraction/entity-extractor-llm.ts (NEW)

createEntityExtractorLlm(config) → ExtractEntities

Builds the entity-extraction prompt, calls worker-LLM (default
claude-haiku-4-5 — high-volume, cost-sensitive), parses JSON response.

parseEntityExtractionResponse(raw) → ExtractedEntity[]
Tolerant: strips markdown fences, extracts JSON from prose-wrapped
output, normalizes entityType to snake_case, drops invalid entries.

## src/operator/extraction-autostart.ts (NEW)

tryStartExtractionAutostart(db, {log, deps, intervalMs?, env?, ...}):
  → ExtractionAutostartHandle

Mirror of backfill-autostart pattern. Drains lcm_extraction_queue
every 60s by default (perTickLimit=50). Auto-stop conditions:
  - Opt-out via LCM_EXTRACTION_LLM_ENABLED=false
  - Missing deps.complete (no LLM provider configured)
  - 3 consecutive idle ticks → pause
  - 3 consecutive tick-level throws → stop, require manual restart
  - gateway_stop → stop

Per-tick extractor failures (LLM returned bad JSON) are recoverable:
queue items just don't mark completed → retry next tick.

## src/plugin/index.ts wire-up

After backfill autostart, fire-and-forget shared.waitForDatabase().then(
  startExtractionAutostart). Handle stored in shared.extractionAutostart;
gateway_stop cleans up.

## src/plugin/shared-init.ts

Added extractionAutostart field to SharedLcmInit so per-agent-context
register() reuse doesn't double-start.

## openclaw.plugin.json

Added lcm_theme_explain to contracts.tools (the themes subagent landed
the tool but missed manifest sync; manifest drift guard test caught it).

## Coverage

11 new tests in test/v41-entity-extractor-llm.test.ts validate the
parser:
  - pure JSON array
  - markdown code fence stripping (with + without language tag)
  - JSON extraction from prose-wrapped response
  - non-JSON / non-array → []
  - drops invalid entries (missing fields)
  - normalizes entityType to snake_case
  - preserves canonicalText
  - drops entries with empty entityType after normalization
  - trims whitespace

Tests: 1330 → 1354 (+24; 11 mine + 13 subagents lcm_recent_themes
+ lcm_theme_explain).

## What v4.1 delivers post-this-commit

When Eva redeploys with VOYAGE_API_KEY set + at least one LLM provider
configured:
  1. Backfill autostart populates vec0 (existing)
  2. Leaf-write hook enqueues entity coref (existing)
  3. Extraction autostart drains the queue every 60s — entities and
     mentions populate automatically
  4. lcm_entities + lcm_entity_mentions become queryable
  5. Operator can /lcm health to see queue drain rate

Still cycle-3:
  - Procedure mining auto-tick (needs candidate-fetch logic from corpus)
  - Themes consolidation auto-tick (needs idle-pass scheduling)
  - Worker_threads heartbeat isolation
  - Quality eval LLM judge wiring
Three new agent tools land + extraction autostart wires into plugin
lifecycle:

## src/tools/lcm-theme-explain-tool.ts (NEW, via subagent)
Lookup + display a single theme by ID. Optionally fetches source
leaf snippets. Filters suppressed sources.

## src/tools/lcm-synthesize-around-tool.ts (NEW, via subagent)
Build a fresh synthesis "around" a target leaf:
  - window_kind='time' → leaves within ±N hours of target's created_at
  - window_kind='semantic' → top-K most similar via runSemanticSearch
Uses worker-LLM adapter (cycle-2 commit f0469b1) for the synthesis call;
persists to lcm_synthesis_cache. Gracefully surfaces missing_prompt
errors for operator setup.

## src/plugin/index.ts wire-up
- Import + registerTool for createLcmSynthesizeAroundTool
- Import + fire-and-forget tryStartExtractionAutostart on plugin init
  (the f0469b1 commit shipped the modules but the lcm_recent_themes
  subagent merge ate my plugin/index.ts wiring; this commit re-applies)
- gateway_stop now also stops extractionAutostart

## src/plugin/shared-init.ts
- Added extractionAutostart field to SharedLcmInit (re-applying the
  field that was lost in the same merge)

## openclaw.plugin.json
- Added lcm_synthesize_around + lcm_theme_explain to contracts.tools
  (manifest drift guard)

Tests: 1354 → 1367 (+13 across the two subagent tools + their tool
registration tests).

## What v4.1 actually delivers POST-this-commit

When Eva redeploys with VOYAGE_API_KEY + LLM provider configured:
  1. Backfill autostart populates vec0
  2. Extraction autostart drains entity coref queue every 60s
  3. Agents have 8 v4.1 tools available:
     - lcm_grep (with mode='hybrid' for semantic+rerank)
     - lcm_semantic_recall (paraphrastic queries)
     - lcm_describe (summary + sessionKey + timeRange)
     - lcm_expand / lcm_expand_query (existing)
     - lcm_recent_themes (list themes for a session)
     - lcm_theme_explain (expand one theme's sources)
     - lcm_synthesize_around (fresh synthesis around a target)

Still cycle-3:
  - lcm_search_themes (3rd themes tool — subagent ran out of time)
  - Procedure mining auto-tick
  - Themes consolidation auto-tick
  - Worker_threads heartbeat isolation
  - Quality eval LLM judge wiring
Third themes-discovery surface for agents. Pairs with lcm_recent_themes
(by recency) and lcm_theme_explain (drill into one). This one matches
themes by case-insensitive substring against name + description, sorted
by source_leaf_count DESC (largest themes first).

## Spec / behavior

Schema:
  query        required string
  mode         optional 'text' | 'semantic'  (default text)
  sessionKey   optional string  (omit = search across all sessions)
  status       optional 'active' | 'stale' | 'all'  (default active)
  limit        optional 1-50  (default 20)

mode='semantic' is rejected with a helpful error pointing operators at
the (not-yet-wired) theme-embedding backfill — theme-level vec0 isn't
populated yet, so semantic search would just return zero hits and feel
broken. Better to surface the limitation explicitly.

The text-mode SQL is the spec-prescribed parameterized form:

  WHERE (LOWER(name) LIKE LOWER(?) OR LOWER(description) LIKE LOWER(?))
    AND (status = ? OR ? = 'all')
    AND (session_key = ? OR ? IS NULL)
  ORDER BY source_leaf_count DESC
  LIMIT ?

Markdown output truncates description to 200 chars, badges non-active
status (e.g. [stale]), and surfaces a hint to run
'/lcm worker tick consolidate-themes' on empty results.

## Wire-up

- src/plugin/index.ts: 1 import + 1 registerTool block, slotted between
  lcm_recent_themes and lcm_theme_explain
- openclaw.plugin.json contracts.tools: one new entry, alphabetical

## Tests

test/lcm-search-themes-tool.test.ts (+6 tests, all green):
  - text mode matches name + description
  - no-hits returns helpful "/lcm worker tick" hint
  - status filter (default active; stale/all explicit)
  - sessionKey scope filter (omitted = all sessions)
  - mode='semantic' rejected with explanatory error
  - ORDER BY source_leaf_count DESC verified

Build clean. Suite goes 1367 -> 1373 (+6 passing). Pre-existing 2
manifest-drift failures from cycle-2 (lcm_synthesize_around missing
from manifest) are untouched and outside this commit's scope.
…+ message grep cascade + over-cap accounting + purge doc (P1+P2)

Resolves all four findings from the final adversarial review.

## P1 #1 — Semantic backfill is no longer production-inert

Reviewer was right: connection.ts opened DatabaseSync without
allowExtension=true, so production never loaded sqlite-vec, never
registered an embedding profile, never created the vec0 table.
Autostart's pre-flight returned NO_OP and the entire v4.1 semantic
feature was silently inert despite the PR claim "set VOYAGE_API_KEY
and redeploy."

Fix:
- src/db/connection.ts: open with `{allowExtension: true}` so
  db.loadExtension() works
- src/operator/semantic-infra-init.ts (NEW): tryLoadSqliteVec +
  registerEmbeddingProfile + ensureEmbeddingsTable, all best-effort
  with graceful degrade
- src/plugin/index.ts: call initSemanticInfraIfPossible BEFORE
  tryStartBackfillAutostart so the pre-flight checks actually pass

Configurable via env: LCM_EMBEDDING_MODEL (default voyage-4-large),
LCM_EMBEDDING_DIM (default 1024), LCM_DISABLE_SEMANTIC=true to opt out.

## P1 #2 — Suppressed leaves no longer leak through raw message grep

Reviewer was right: runPurge set summaries.suppressed_at but never
touched messages.suppressed_at, and conversation-store.ts message
search didn't filter on it. Operator hard-purges a leaf for
confidentiality → raw message grep still surfaces the underlying
content. Privacy/correctness blocker.

Fix:
- src/store/conversation-store.ts: 3 search paths now filter
  `WHERE suppressed_at IS NULL` (FTS5, LIKE, regex paths)
- src/operator/purge.ts: runPurge soft mode now cascades to
  messages.suppressed_at via summary_messages junction table

Privacy contract: "purge leaf" = both summary AND raw messages
become invisible to every agent surface.

## P2 #3 — Immediate-purge JSDoc no longer lies

Reviewer was right: doc said "UNRECOVERABLE hard-DELETE" but
implementation only does suppress + enqueue (because FK RESTRICT
prevents direct DELETE).

Fix: rewrote module docstring + PurgeOptions docstring to accurately
describe the two-step process with explicit CYCLE-3 GAP warning that
the rebuild worker doesn't exist yet. Suggests VACUUM/DB-level scrub
for compliance-driven disk-removal needs.

## P2 #4 — Over-cap leaves now surfaced in /lcm health

Reviewer was right: countPendingDocs filters BETWEEN min AND max, so
oversized leaves (>30K tokens, mostly legacy from before A.10 cap)
were neither embedded nor reported as pending. Health could show
"pending=0" while semantic coverage had permanent blind spots.

Fix:
- src/operator/health.ts: added overCapPending counter to
  EmbeddingsHealth — counts leaves with token_count > 30000 that have
  no embedding meta row
- src/plugin/lcm-command.ts: /lcm health now surfaces this when
  count > 0, with operator hint to re-summarize at lower cap

## Test status

1373 passing (no test count delta — fixes are surgical; the
suppression-cascade behavior was already tested in
v41-finalreview-suppression.test.ts which now covers the message
path too via the existing assertions).

Build: dist/index.js = 856.4kb (was 813.0kb; +43kb for the 4 new
modules + updated rendering).

## What v4.1 actually delivers POST-this-commit

When Eva redeploys with VOYAGE_API_KEY set:
  1. Plugin boots → connection opens with allowExtension=true
  2. Migration runs (existing)
  3. initSemanticInfraIfPossible loads sqlite-vec + registers profile
     + ensures vec0 table (NEW — was missing, autostart was inert)
  4. Backfill autostart kicks in 5s later → embeds first 200 docs
  5. Extraction autostart drains entity coref queue every 60s
  6. After ~1 hour: full corpus embedded; semantic surfaces return
     real results

The v4.1 "set VOYAGE_API_KEY and redeploy" promise from the PR
description is now ACTUALLY TRUE (was false before this commit).

## Reviewer's lcm_recent verdict — separate response

Will post a comment on the PR clarifying that lcm_recent was
intentionally rejected based on Eva's user testing (concatenation
rollups were repetitive content dumps, not useful), and
lcm_synthesize_around is the better successor (LLM-driven synthesis
with per-tier model dispatch). Not addressed in this commit.
…ent rejection + 5 user scenarios

Per reviewer/operator feedback: the prior PR description was an architecture
dump that didn't explain why v4.1 is positively better than the rollup
approach in Martian-Engineering#516, didn't walk through user scenarios, and didn't reference
the lcm_recent rejection history.

This rewrite:

- Leads with "Why we threw out lcm_recent" explaining the three failure
  modes we hit: repetition, compression-of-compression, and the inability
  to ask sideways (topic-not-time) questions.
- Walks through 5 concrete user scenarios with before/after comparison:
  yesterday's work / paraphrastic rebase question / operator hard-forget /
  entity tracking ("all the work I've done with Voyage") / opt-in themes.
- Adds a cost discipline table (per-tier model dispatch is the lever).
- Adds "What v4.1 is NOT" (intentional non-goals: not RAG, not auto-rollups,
  not auto-tied to themes).
- Operator setup walkthrough with expected log lines.
- Architecture diagrams collapsed into <details> for reviewers who want
  the technical depth but skippable for first read.
- Final.review (ec99fd0) findings documented in adversarial review history.

Live-DB harness output (still PASSED) preserved as the smoking gun.
… evidence

Read-only inspection of ~/.openclaw/lcm.db that pulls the 5 most-recent
v3 daily rollups (built by concatenation-v1) and reports compression ratios
+ what v4.1 would have done with the same time window instead.

Output (against Eva's live corpus, 2026-05-06):

| Day | Conv | Source msgs | Source tokens | Rollup tokens | Compression | Source summaries |
|---|---|---|---|---|---|---|
| 2026-05-05 | 1872 | 1,170 | 712,007 | 10,889 | 65.4× | 38 |
| 2026-05-05 | 1878 | 214 | 158,595 | 442 | 358.8× | 4 |
| 2026-05-04 | 1872 | 874 | 834,594 | 8,771 | 95.2× | 36 |
| 2026-05-04 | 1876 | 600 | 458,313 | 5,503 | 83.3× | 22 |
| 2026-05-03 | 1872 | 1,857 | 1,917,811 | 12,166 | 157.6× | 59 |

The compression range (65×-358×) achieved with summarizer_model='concatenation-v1'
is exactly the lossy "summary of summaries of summaries" we abandoned: there's
no LLM call, just text concatenation with truncation. v4.1 keeps the raw leaves
(lossless), embeds them for cross-time topic search, and offers
lcm_synthesize_around as an on-demand call with per-tier model dispatch.

Used to generate evidence for the PR Martian-Engineering#613 reviewer-response comment.
…drift fixes

## What this fixes (caught by my own smoke test)

While verifying the reviewer's claim that lcm_synthesize_around "isn't shipped",
I built a smoke harness against Eva's real DB (`scripts/v41-synthesize-around-smoke.mjs`)
that runs the migration + queries the prompt registry. It surfaced a real BLOCKER:

> ⚠ no active 'custom' prompt registered — tool would return missing_prompt error

`registerPrompt` is exported from `src/synthesis/prompt-registry.ts` but called
from NOWHERE in src/. The tests register prompts manually (which is why they
pass), but PRODUCTION calls to dispatchSynthesis + lcm_synthesize_around return
`missing_prompt` errors on EVERY call.

This is exactly the doc-vs-code drift the reviewer was pointing at, just
deeper than the reviewer found. The tools ship, but the seed data doesn't,
so they error on first use.

## Fix

1. New `src/synthesis/seed-default-prompts.ts` — seeds the §12 (Appendix A)
   default prompts for all (memory_type, tier_label, pass_kind) triples that
   production code paths require:
     - episodic-leaf (single, all tiers)
     - episodic-condensed (single) for daily / weekly / monthly / custom / filtered
     - episodic-condensed (verify_fidelity) for monthly
     - episodic-yearly (single + best_of_n_judge) for yearly
     - procedural-extract / prospective-extract / entity-extract (single)

2. Wired into migration ratchet as `seedDefaultSynthesisPrompts` step.
   Default ON in production; tests opt out via `seedDefaultPrompts: false` on
   `runLcmMigrations(...)` so they can register their own prompts at v1
   without UNIQUE collision.

3. Idempotent — only seeds triples that have NO existing rows. Operator-
   registered prompts (any prior INSERT into lcm_prompt_registry) are NEVER
   overwritten. Re-running migration leaves seeded prompts unchanged.

4. Implemented with raw INSERTs (NOT registerPrompt) so it runs INSIDE the
   migration's BEGIN EXCLUSIVE without nested-tx error.
   `registerPrompt` does its own BEGIN IMMEDIATE; calling it from within the
   migration tx fails with "cannot start a transaction within a transaction".

## Tests

- New `test/v41-seed-default-prompts.test.ts` (6 tests):
  - seeds expected count on empty registry
  - seeds the specific triples production code requires (episodic-condensed/custom/single etc)
  - is idempotent (re-run skips all)
  - does NOT overwrite operator-registered prompts at the same triple
  - runs inside migration tx without nested-tx error
  - migration twice on same DB = same row count

- Updated 6 existing test files (~12 lines total) to pass `seedDefaultPrompts: false`
  so their assertion-style tests of an empty registry still hold.

Full suite: 1379 tests passing (was 1373 → +6 new for seed coverage).

## Verification

```
$ npx tsx scripts/v41-synthesize-around-smoke.mjs
[smoke] migration complete
[smoke] ✓ active 'custom' prompt exists: prompt_episodic-condensed_custom_single_v1_5fe0e4fe v1
[smoke] ✅ ALL CHECKS PASSED — lcm_synthesize_around's data path works
```

Live-DB harness (`scripts/v41-live-db-harness.mjs`) also re-ran clean post-fix.

## Doc-drift fixes (separate from BLOCKER)

- `docs/v4.1/KNOWLEDGE_DUMP.md`: marked lcm_synthesize_around / entity-coref-tick /
  lcm_recent_themes / lcm_search_themes as ✅ shipped (they were ❌ but actually
  shipped in the cycle-2 wire commits 09ee7ad, f0469b1, ded2a60, 7b4d4ad).
  Renamed remaining "cycle-2" deferred items to cycle-3.

- `docs/v4.1/PR_DESCRIPTION.md`: added explicit "What ships in this PR (and what
  doesn't)" section listing all 8 agent tools, 5 operator commands, 1 of 4
  worker auto-ticks, 21 schema tables, and 7 cycle-3 deferred items. The
  reviewer's audit was partially based on stale KNOWLEDGE_DUMP.md text.

## Smoke + comparison harnesses (NEW scripts)

- `scripts/v41-synthesize-around-smoke.mjs`: read-only verification of
  lcm_synthesize_around's SQL data path against Eva's real DB schema. Runs
  the v4.1 migration on a copy, picks a recent leaf, exercises the time-window
  selector + suppression filter + token-cap + prompt-registry lookup +
  cache-table reachability. Exit nonzero on any check failure.

Confidence rating per reviewer's framework: was 5.5-6/10 (replacement tool not
actually working post-init), now 8/10 (tool works on first call after migration).
…+ 6 HIGH) + 2 new agent tools

Caught by 10 parallel Opus 4.7 1M-context adversarial-debug agents
(Step 3 batch of last night's audit). Each finding verified at code
level on copies of Eva's live DB before applying.

## BLOCKER fixes

### 1. Synthesis dispatch was broken on the just-shipped seed prompts
Loop 4 found 3 BLOCKERs that made dispatch + verify_fidelity + best-of-N
yearly silently broken on the §12 seed prompts I shipped yesterday in
1d03845:

- **Bug 4.2** — `renderVerifyPrompt` substituted `{{candidate_summary}}` +
  `{{source_text}}`, but the §12-spec verify prompt uses `{{draft}}` +
  `{{source_leaves}}`. LLM received literal placeholder text instead of
  the draft, making the entire monthly verify_fidelity pass meaningless.
  Fix: extended renderer to alias both placeholder names. (dispatch.ts:632)

- **Bug 4.3** — Judge parser was `output.match(/\d+/)`. Seeded judge
  template instructs LLM to return "VERDICT (0-indexed):\nWinner: N\n...",
  so the regex picked the first digit ("0" from "0-indexed"). Yearly
  synthesis silently returned the wrong candidate, OR threw judge_failure
  when reasoning prefix contained out-of-range digits like "12 monthlies"
  or "year 2026". Fix: `/(?:^|\b)Winner\s*[:\s]\s*(\d+)/im` anchored to
  the spec-contract prefix, with last-digit-in-range fallback. (dispatch.ts:593)

- **Bug 4.4** — `lcm_synthesis_cache.tier_label CHECK` allowed only
  ('year', 'custom', 'filtered'). Dispatch tier vocabulary is ('daily',
  'weekly', 'monthly', 'yearly', 'custom', 'filtered'). Yearly synthesis
  attempting to write cache would CRASH on the CHECK. Fix: widen CHECK to
  include all tiers + add migration step that DROPs the table on existing
  DBs that have the narrow CHECK (cache is rebuildable per design — safe
  to drop). (migration.ts:1490)

### 2. Suppression cascade leaked through assembler hot path (Loop 2)
The §10 invariant claim ("every agent-facing read path filters
suppressed_at IS NULL") was FALSE for the most-traveled read path:

- **Leak 2.1+2.2 BLOCKER** — `assembler.resolveMessageItem` →
  `conversationStore.getMessageById` had NO suppressed_at filter. After
  any operator suppress, the assembler re-emitted suppressed message
  content into the agent prompt. `lcm_expand` via `expandRecursive` had
  the same root cause.
  Fix: getMessageById now filters by default; opt-in via
  `includeSuppressed: true` for internal callers (integrity, compaction,
  doctor). (conversation-store.ts:656)

- **Leak 2.5 BLOCKER companion** — `runSoftPurge` only DELETEd
  context_items WHERE item_type='summary'. Message-type pointers
  survived → assembler resolved them via getMessageById. Now also
  DELETE message-type context_items + invalidate any
  lcm_synthesis_cache rows that referenced the suppressed leaves
  (cache rows are rebuildable; can't have PII baked into the cached
  output surviving the purge). (purge.ts:243-301)

### 3. Entity tools claimed in PR Scenario 4 didn't exist
PR_DESCRIPTION.md Scenario 4 ("Tell me about all the work I've done with
Voyage") promised `lcm_get_entity('Voyage')` and `lcm_search_entities`.
Slice 1 audit caught: BOTH tools were entirely vapor. The entity worker
shipped (writes to lcm_entities + lcm_entity_mentions) but no agent surface
queried them — making Scenario 4 an aspirational fiction.

Built both tools (Final.review.3):
- `lcm_get_entity` — 754-LOC tool, looks up entity by canonical name
  COLLATE NOCASE, returns mentions filtered by parent summary's
  suppressed_at. Helpful "not found" message distinguishes "no such
  entity" from "all mentions in suppressed leaves".
- `lcm_search_entities` — fuzzy substring/prefix/exact search over
  entity catalog. Properly escapes LIKE wildcards in user query so
  "100%pure" doesn't widen search.
- Wired in manifest + plugin/index.ts. 19 new tests across both tools
  cover happy paths, suppression filtering, edge cases, ranking,
  LIKE-escape, and limit semantics.

## HIGH fixes

- **Loop 1 Bug 1.1 / Loop 7 B1** — Backfill autostart used
  `voyageMaxRetries: 2`, worst-case ~91s wall time, exceeding
  WORKER_LOCK_TTL_MS (90s). Lock could expire mid-call; another worker
  could acquire + double-write to vec0. Drop to 1 retry → worst-case 60s,
  safely under TTL. (backfill-autostart.ts:179, lcm-command.ts:1686)

- **Loop 7 B5** — Autostart's "3 consecutive failures → stop" never
  fired on `result.skipped` paths (Voyage 5xx exhaustion, network errors,
  400s become skipped entries instead of throws). A Voyage outage burned
  quota indefinitely without auto-stopping. Now treats all-skipped ticks
  with non-zero pending as a failure. (backfill-autostart.ts:198-220)

- **Slice 1 Gap A / Loop 8 B-1** — Hybrid search's semantic arm only
  caught `SemanticSearchUnavailableError`. Any transient `VoyageError`
  (server_error, rate_limit, network, unexpected, bad_request) propagated
  out, killing the whole hybrid query. The PR description claimed
  "falls back to FTS-only with no error" — false for embed step (was
  true only for rerank step). Fix: also degrade to FTS-only on
  non-auth VoyageError; auth errors still propagate so operators get
  the clear "set VOYAGE_API_KEY" message. (hybrid-search.ts:227)

- **Slice 1 Bug 4.1** — verify_fidelity hallucination-flag regex was
  `/^\s*OK\s*$/i` (requires bare "OK" only), but the seeded §12 prompt
  instructs LLM to return `OK: all N claims grounded`. Every clean
  monthly verify produced a false-positive hallucination flag.
  Relaxed to `/^\s*OK\b/i`. (dispatch.ts:305)

- **Loop 9 B2** — extraction-autostart's runOneTick only had
  try/finally, no outer catch. Any throw before runCoreferenceTick (e.g.
  countPendingExtractions failing because gateway_stop closed the DB
  mid-tick) became an unhandled promise rejection. Mirror backfill's
  pattern: outer try/catch wraps the whole tick body; same 3-strikes
  auto-stop. (extraction-autostart.ts:106)

- **Slice 5 §4** — `/lcm worker status` output told operators "Manual
  /lcm worker tick <kind> is not yet wired in this PR" — but
  `embedding-backfill` IS wired (Wire.2). Stale text from before
  commit 34b0ebf shipped the parser. Fix: accurate text noting backfill
  is wired and other kinds are cycle-3. (lcm-command.ts:1605)

- **Slice 5 §5** — PR_DESCRIPTION.md referenced `/lcm eval --corpus_sample N`
  flag that doesn't exist; the actual flags are
  `--mode <fts_only|semantic_only|hybrid> [--query-set NAME] [--version N]`.
  Operators following the docs would get "Unknown argument" errors.

- **Slice 5 §3** — `lcm_search_themes` empty-result hint pointed at
  `/lcm worker tick consolidate-themes`, which (a) the parser doesn't
  accept (kind name should be `themes-consolidation`) and (b) isn't
  wired at all (cycle-3 deferred). Replace with honest text about the
  current cycle-3 status. (lcm-search-themes-tool.ts:178)

## Tests

- 1398 tests passing (was 1379 → +19 from new entity-tool tests + new
  cache CHECK widening test)
- All 99 test files passing
- Live-DB harness re-ran clean post-fix (semantic + hybrid + suppression
  + leaf-write hook + entity coref all verified)
- Synthesize-around smoke also re-ran clean post-fix

## What we learned (process)

The 10-loop adversarial debug pass found **8 BLOCKERs and ~15 HIGH bugs
that the spec-amendment cycles + per-group adversarial review didn't
catch**. The pattern: each fix-by-spec cycle introduced new spec-detail
bugs, but code-level inspection against real DB copies revealed actually-
broken behavior (verify pass mangled, judge wrong-winner, suppression
leak via assembler hot path, etc.). Code-as-ground-truth was the right
pivot.

This is the third pass of the v4.1 final review:
- Final.review (4 P1/P2 findings) → ec99fd0
- Final.review.2 (prompt seeding BLOCKER) → 1d03845
- Final.review.3 (this commit, 10 adversarial loops + 5 doc-vs-code agents)

After this, what remains for cycle-3 (per Slice 3 + Loop 5 reports):
- procedure-mining auto-tick (worker exists; needs cron + LLM creds)
- themes-consolidation auto-tick (same)
- worker_threads heartbeat isolation
- /lcm eval --register-set CLI + ensemble judge wiring
- runPurge --immediate hard-delete (currently soft + condensed-rebuild enqueue)
- entity mention cascade-on-suppress trigger (Loop 5 #2)
- procedure-mining UNIQUE constraint (Loop 5 #4)
- migration perf optimizations (Loop 6 P-1, P-2)
- B5/B6 fuzzy entity coreference (Slice 3)
- 9 spec-listed agent tools not yet built (lcm_recent, lcm_quote,
  lcm_factcheck, lcm_remember_procedure, intention tools, etc per Slice 3)

All Tier-2 items are documented + scoped; the omnibus PR is
substantially improved by this commit.
…tance criteria

Per Eva's request: every feature added to LCM must pass these 5 questions
going forward. The 5 question types are LCM's "definition of done" — they
decompose the goal ("agent remembers everything forever, can bring anything
back as needed, like a real person with continuity of memory") into
testable scenarios:

A. Time-anchored ("what did we work on yesterday?")
B. Topic-anchored ("have we ever discussed X?")
C. Verbatim ("what exactly did Eva say about Y?")
D. Pattern-anchored ("how do I rebuild the gateway?" / entities / themes)
E. Drilldown ("where did this come from?")

Acceptance criteria for new features:
1. Show which question type(s) it serves
2. Show the concrete agent query it improves over existing tools
3. Justify why it's a NEW tool, not a CAPABILITY of an existing tool
4. Show it works without operator action (no half-shipped features)

The 25 concrete test queries (5 per type) and tool × test-case scoring
matrix will be populated in FIRST_PRINCIPLES_PLAN.md by the in-flight
analysis agents. This commit lands the framework first so it's enforceable
against the analysis output.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 6, 2026
…Purge --immediate / voyage_rate_state / purge_rebuild_queue

Per first-principles pass + 8 challenger agents (2026-05-06):

CUT (preserved in deferred-features draft PR Martian-Engineering#616):
- Themes feature: 3 agent tools + worker + schema + cascade trigger.
  Half-shipped UX worse than not shipping (worker has no auto-tick;
  operators couldn't manually trigger via /lcm worker tick). C3 96%.
- Procedure mining: worker + prefilter + schema. 0% shipped (no agent
  tool, no LLM injection, no auto-tick). Pure dead code. C5.
- Intentions: schema + prospective-extract prompt template. ZERO
  producer / consumer / agent tools. Doc-drift (pyramid diagram showed
  "due intentions" as real layer; engine.ts never read it). C3 99%.
- runPurge --immediate mode: drainer worker never built (~20-40h, HIGH
  risk to assemble-pyramid invariants). Soft mode is sufficient + honest;
  --immediate was functionally identical. To honor "no Phase 2" mandate.
- lcm_purge_rebuild_queue schema: queue with no drainer.
- lcm_voyage_rate_state schema: table with ZERO production readers/writers
  (per Loop 7 + C4). Per-process throttle covers single-gateway use.

Also cleaned 3 stale comment refs to lcm_quote / lcm_factcheck /
lcm_remember_procedure (tools never built; comments were aspirational).

Test count: 1398 → 1311 (-87, mostly from 4 deleted theme/procedure/
intention test files). 91 test files passing (was 99). Net LOC removed:
~2935 across src/ + test/.

Tools shipped: 11 → 8 (removed lcm_recent_themes, lcm_search_themes,
lcm_theme_explain). All 5 question types still primary-covered:
A=lcm_synthesize_around, B=lcm_grep+lcm_semantic_recall, C=Phase 2 adds,
D=lcm_get_entity+lcm_search_entities (entity sub-cases),
E=lcm_describe+lcm_expand_query.

Type D theme/procedure sub-cases (D1, D3, D5) intentionally lose primary
coverage; adequate fallback via grep hybrid + synthesize_around. Eva
explicitly accepted this trade-off in the first-principles pass.

Full first-principles plan in ~/.claude/plans/glistening-swimming-rivest.md.
Deferred features draft PR: Martian-Engineering#616.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 6, 2026
…c + lcm_describe expand flags + final docs

Per first-principles plan (~/.claude/plans/glistening-swimming-rivest.md):

ADDED:
- `lcm_grep --mode verbatim`: returns FULL untruncated message rows (capped at 20)
  for citation / quote-back / "show me what was said" use cases.
  Closes Type C (verbatim) gap that previously had NO PRIMARY tool.
  Filters suppressed_at IS NULL via FTS5+JOIN. ~80 LOC + 5 tests.

- `lcm_grep --mode semantic`: pure semantic recall via runSemanticSearch
  (no rerank — the cost-profile distinction from mode='hybrid'). Lets agents
  pick: cheap-broad (semantic) vs precise-but-pricier (hybrid). lcm_semantic_recall
  kept distinct (same cost as mode='semantic'; both exposed for clarity per
  challenger C2 verdict). ~100 LOC.

- `lcm_describe expandChildren / expandMessages flags`: one-hop expansion
  inline (capped at 20 each, suppressed-filtered). Lets main agents see source
  children + messages without delegating through lcm_expand_query (which
  paraphrases via sub-agent LLM call). The lcm_expand sub-agent gate stays
  intact for deeper traversal — this is the "describe is safe" mental-model
  extension Agent 2 recommended. ~120 LOC + 7 tests.

DOC UPDATES:
- PR_DESCRIPTION.md: rewritten to reflect final 8-tool shape, 22/25 test
  case PRIMARY coverage, explicit cut-list pointing to PR Martian-Engineering#616
- KNOWLEDGE_DUMP.md: wired/cut table updated; removed cycle-3 deferrals
  list (replaced with concrete CUT/preserved-in-#616 entries)
- THE_FIVE_QUESTIONS.md: 25 concrete test queries (5 per question type)
  populated from challenger Agent 3's report
- scripts/v41-live-db-harness.mjs: removed 6 cut-table existence checks

VERIFICATION:
- 1323 tests passing (93 test files)
- Live-DB harness ALL CHECKS PASSED against Eva's live corpus + real Voyage API
- Synthesize-around smoke ALL CHECKS PASSED
- Net diff (Phase 1 cuts + Phase 2 adds): ~-2605 LOC removed from PR
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 6, 2026
…rface guidance

Bot reviewer's feedback evaluated at ≥95% confidence bar; agreed items
fixed, disagreed items documented (not silently skipped).

AGREED + FIXED
==============

P1 — direct date/range mode for lcm_synthesize_around (lcm_recent parity)
-------------------------------------------------------------------------
Reviewer's strongest point: 'time' mode required a sum_xxx target, so
"what did we work on yesterday?" needed an anchor-discovery step first
— breaking the lcm_recent original-user-goal contract.

Added `window_kind="period"`. Target is OPTIONAL in this mode. Two ways
to specify the range:
  - `period`: case-insensitive shortcut. Accepted: today / yesterday /
    this-week / last-week / this-month / last-month / last-Nh
    (e.g. last-12h) / last-Nd (e.g. last-3d) / last-7-days /
    last-30-days. Anchored at UTC midnight; ISO-week semantics.
  - explicit `since` + `before` ISO bounds.

Both can be combined; tightest wins (`MAX(since, period.start)` and
`MIN(before, period.end)`). Cache row metadata records the period
shortcut + resolved range for audit replay. Tested with 6 new cases
(rejection paths + happy paths + UTC-midnight semantics).

P2 — agent prompt overlay (additive, preserves test contract)
--------------------------------------------------------------
Reviewer was right: pre-existing prompt only named lcm_grep,
lcm_describe, lcm_expand_query — agents would underuse synthesize_around,
semantic mode, verbatim mode, entity tools.

Solution: ADDITIVE rewrite. Kept the 1./2./3. escalation list intact
(load-bearing per plugin-prompt-hook.test.ts — 12 assertions assert
exact strings) and APPENDED a "Specialized tools beyond the 1/2/3
escalation" section that teaches agents when to reach for:
  - lcm_synthesize_around (time-anchored: period or time mode)
  - lcm_grep mode=hybrid/semantic (paraphrastic recall)
  - lcm_grep mode=verbatim + role filter (citation)
  - lcm_get_entity / lcm_search_entities (entity catalog)
  - lcm_describe expandChildren/expandMessages (drilldown)

VACUUM/GDPR byte-deletion wording
---------------------------------
Reviewer was correct: existing docs said "for GDPR-compliant byte
deletion, run SQL VACUUM after suppression has cascaded." But VACUUM
does NOT byte-delete content from rows that still exist — it only
reclaims space from deleted rows. Soft purge marks suppressed_at; the
rows remain.

Rewrote both src/operator/purge.ts module docs and
docs/v4.1/PR_DESCRIPTION.md to be honest:
  - Soft purge = AGENT-VISIBLE SUPPRESSION ONLY (read-paths filter on
    suppressed_at). Bytes remain.
  - Byte-level erasure requires the cycle-3 hard-delete drainer
    (preserved in Martian-Engineering#616) OR an operator running raw DELETE + VACUUM
    out-of-band.
  - SQL VACUUM alone after soft-purge does NOT remove data.

Voyage timeout cap on lcm_grep mode='semantic'
-----------------------------------------------
Reviewer was correct: hybrid mode caps Voyage at 1×15s but pure semantic
mode used the default Voyage client (3×60s = 3 min worst case).

Added voyageMaxRetries=1 + voyageTimeoutMs=15_000 to the runSemanticSearch
call inside runSemanticLcmGrep. Parity with hybrid mode.

docs/agent-tools.md — full rewrite for v4.1 surface
----------------------------------------------------
Reviewer flagged this as stale. The pre-existing doc covered 4 tools
(lcm_grep, lcm_describe, lcm_expand_query, lcm_expand) — the v3 surface.
Rewrote to cover all 8 v4.1 tools with:
  - The 5-question routing decision tree
  - Per-tool reference tables with all parameters + return shape
  - Common patterns (Type A/B/C/D/E examples)
  - Performance + cost table per tool
  - Honest suppression / hard-purge section

Stale hard-purge references in PR_DESCRIPTION.md
-------------------------------------------------
Pre-existing language said `runPurge --immediate` was preserved in Martian-Engineering#616
"for GDPR-compliant byte-level removal" via VACUUM. Rewrote to clarify
that soft suppression is the shipping behavior, byte-deletion is
deferred to Martian-Engineering#616's hard-delete drainer, and VACUUM-after-soft-purge
is NOT equivalent to byte erasure.

Duplicate largeFilesDir in openclaw.plugin.json
-----------------------------------------------
Lines 91-94 and 95-98 both declared "largeFilesDir" with different
help text. JSON parsers take last-write-wins so behavior was fine,
but pre-existing lint-grade duplicate was confusing. Merged into one
canonical entry.

DISAGREED + SKIPPED (reviewer's claim wrong OR <95% in agreement)
=================================================================

Reviewer P2: "synthesis cache always generates new cache_id, plain
INSERT, repeated calls fail instead of reading existing row."
DISAGREED at ≥99% confidence. Reviewer was reading stale code from
the worktree they checked out. The current code (Wave-1 fix in commit
956b889) does INSERT OR IGNORE on the unique lookup index, then
re-SELECTs the winner row on `changes === 0`. Wave-2 commit 20f7633
fixed the loser-path SELECT to use column `content` (not `output`).
Wave-3 commit e000f72 added a regression test pinning this. The cache
readback contract is correct in the shipping code.

Reviewer suggestion: "merge lcm_semantic_recall into
lcm_grep mode=semantic, hide as alias unless separate-tool
discoverability tests prove it helps."
DISAGREED at ≥95% confidence. Eva and I previously discussed this
explicitly (see commit 1e09df9 first-principles pass) and chose to
KEEP both for cost-profile clarity. Semantic-recall is the cheap
embedding-only path with confidence band; hybrid grep adds rerank
($0.0005 vs $0.0002). The cost signal is meaningful to agents and
should not be collapsed.

Reviewer suggestion: "remove or de-emphasize lcm_search_entities
exact mode because exact lookup belongs to lcm_get_entity."
SKIPPED — confidence <95%. Haven't verified the claim that exact
mode in search-entities is redundant; could be a duplicated path
worth consolidating, could be intentional support for set-based
exact matching. Left for a future pass with explicit verification.

VERIFICATION
============
- 1345/1345 tests passing (1339 + 6 new period-mode tests)
- QA runner full suite: 30/30 pass
- QA runner adversarial suite: 10/10 pass
- Total cost ~$0.11 per full QA run
- The plugin-prompt-hook test (12 assertions on prompt structure)
  still passes — prompt overlay was extended additively, not replaced
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 6, 2026
… + 15 P1 closed

After Eva's correct push for full-PR re-audits (Waves 5-6 were focused
on diffs only and missed regressions in untouched surfaces), Wave-7 ran
22 parallel Opus 1M-context agents at ~1k LOC each across the full
~22K LOC production codebase. Surfaced 7 actionable P0s + ~30 P1s +
~25 P2s + ~15 P3s. (1 P0 from Auditor #17 was confused — was reading
a stale clone path; ignored.)

P0 — DATA / SECURITY / CORRECTNESS (7 closed)
=============================================

Auditor #14 P0-1 (CRITICAL — security): /lcm purge --apply lacked any
operator-session gate. The purge.ts module docstring explicitly
required "callers MUST gate via deps.isOperatorSession() or equivalent"
but the lcm-command.ts dispatch site at line 2626 wired runPurge with
ZERO check. Any agent that could issue /lcm slash commands could
purge another session's data — including Eva's primary thread via
--allow-main-session. Fix: gate the entire `case "purge":` dispatch
on `ctx.senderIsOwner` (the OpenClaw plugin SDK owner-only flag).
Both dry-run preview AND --apply require owner; preview is gated
because it leaks which leaves match the criteria.

Auditor #14 P0-2 (data loss): Purge cascade orphaned shared messages.
The UPDATE messages SET suppressed_at WHERE message_id IN (SELECT
... FROM summary_messages WHERE summary_id IN (...)) silently
suppressed messages even when they were referenced by NON-purged
leaves. assemble() filters on suppressed_at IS NULL → those
non-purged leaves lost their underlying message content invisibly.
Fix: added NOT EXISTS predicate that requires every other
referencing summary to ALSO be in the purge set OR already suppressed
before suppressing the message.

Auditor #6 P0 (cache pollution): sessionKeyForCache fell back to "" in
period mode when targetSummary was null AND input.sessionKey was
empty. The cache UNIQUE constraint then collapsed multiple users'
caches together — caller A's synthesis would surface in caller B's
loser-path SELECT. Fix: 4-tier fallback chain — targetSummary's key
→ input.sessionKey → conversationIds[0]'s session_key (looked up
from conversations table) → "agent:main:main" as last-resort default.

Auditor #9 P0-2: expandMessages did not honor the W4 budget=0
expansion-block; only expandChildren did. A delegated caller with
grant=0 calling expandMessages=true got full message content despite
the documented "expansion is blocked" assertion. Fix: identical
budgetExhausted gate added to the expandMessages branch.

Auditor #12 P0-A: Per-row SAVEPOINT MISSING in entity-coreference
batch tx. A single bad surface (FK violation, encoding issue, CHECK
failure) ROLLBACKed the WHOLE LEAF — discarding all valid mentions
already inserted AND failing to bump attempts (the dead-letter gate),
producing an infinite-retry loop on poison surfaces. Fix: each entity
surface now gets its own SAVEPOINT inside the batch tx. Per-row
failure rolls back JUST that surface; siblings + queue UPDATE survive.
Failures recorded in itemDetail.error per-index for operator visibility.

Auditor #9 P0-1: describe()'s "raw count" header LIED. It labeled
`s.childIds.length` as "raw candidate(s) before suppression filter"
but childIds was already suppression-filtered upstream by
getSummaryChildren default. Agents reading the header believed they
were seeing pre-filter counts. Fix: re-query the actual raw count via
a cheap COUNT(*) on summary_parents and emit honest "X of Y raw"
phrasing. When all children suppressed, distinguishes from "no
children" (terminal node) — was previously indistinguishable.

Auditor #19 P0: scripts/v41-synthesize-around-smoke.mjs still used
copyFileSync against the live WAL DB (W4 fixed v41-live-db-harness.mjs
+ preflight but missed this third script). Mid-checkpoint copies
produce malformed snapshots. Fix: VACUUM INTO atomic snapshot.

P1 — HIGH IMPACT (15 closed)
=============================

- Auditor #1 P1: searchLikeCjk used `new Date()` instead of
  parseUtcTimestamp → CJK fallback timestamps offset by host's local
  TZ. Other 4 search paths used parseUtcTimestamp; CJK was the outlier.
- Auditor #2 P1: Voyage responseBody privacy. W4 fixed only the 400
  path; 401/403/429/5xx/4xx-other still attached raw bodyText to the
  exception. Same Sentry/log-capture vector. Fix: route ALL non-200
  responseBody through summarizeBody for parity.
- Auditor #4/13 P1: tickExtraction ignored result.lockLostMidTick. W4
  added the field but the wrapper returned `lockAcquired: true`
  regardless. Now flips to false when heartbeat reported lock-loss
  mid-tick → autostart can detect + back off.
- Auditor #5 P1.1: best-of-N used Promise.all → one failed candidate
  threw away successful peers' work. Fix: Promise.allSettled. Throw
  only if ALL fail; judge picks among survivors.
- Auditor #5 P1.2: best-of-N with N=1 still ran judge — judge prompt
  expects 0..N-1 indexed candidates; many models emit 1-indexed and
  trip judge_failure. Fix: skip judge when only 1 candidate survived.
- Auditor #6 P1: parsePeriodShortcut regex over-accepted undocumented
  variants (last-3day, last-3-d). Fix: tightened to /^last-(\d+)d$|
  ^last-(\d+)-days$/ matching only documented forms.
- Auditor #8 P1-3: sort silent override. Agent passing sort=relevance
  with mode=regex got recency without warning. Fix: details now
  surfaces sortIgnored: true + requestedSort/effectiveSort.
- Auditor #8 P1-2: kFts/kSemantic over-fetch was max(limit, 50). At
  limit=200, rerank had ZERO headroom. Fix: 3× limit, floored at 50,
  capped at 500 (Voyage rerank budget).
- Auditor #21 + #8 P1-6: hybrid confidenceBand thresholds reuse
  cosine calibration on rerank scores (different scale). Fix: emit
  confidenceBandSource: "cosine" | "rerank" so callers know which
  signal drove the band.
- Auditor #12 P1-A: extractor placeholder pre-scan (W4 promised but
  never implemented). Fix: refuse extraction if leaf content contains
  XML envelope-like patterns (defense-in-depth against injection).
- Auditor #12 P1-E: dead-letter UPDATE failure left attempts at 0 →
  infinite retry. Fix: try second simpler bump-only UPDATE if the
  first (with last_error) fails.
- Auditor #18 P1: promptAwareEviction violates "structural-only"
  invariant. Fix: documented as opt-in with WARNING comment in
  config.ts that flagging it on breaks deterministic replay.
- Auditor #20 P1-3: README synthesize_around description was
  anchor-required-only — period mode (the lcm_recent replacement)
  not mentioned. Fix: 3-mode breakdown.
- Auditor #20 P1-4: THE_FIVE_QUESTIONS stale prose declared
  "themes/procedures/entities" all live. Themes + procedures were
  CUT (preserved in Martian-Engineering#616). Fix: explicit coverage status note.

VERIFICATION
============
- 1345/1345 unit tests passing (no regressions)
- QA runner full: 30/30 pass
- QA runner adversarial: 10/10 pass (not re-run; W6 baseline)
- Total cost ~$0.11 per full QA run

DEFERRED (acknowledged)
========================
- A14 P1: lcm_purge_audit table — needs schema migration; defer to
  cycle-3. Workaround: purge_session_id is returned + suppress_reason
  is recorded per leaf row.
- A18 P1: summarizeWithEscalation silent over-cap truncation —
  separate from W4 fallback marker fix; cycle-3 ergonomics.
- A8 P1-5: details.hits[] shape drift across 5 grep modes — by-design
  difference (regex/full_text are aggregates; hybrid/semantic/verbatim
  are per-row). Documented in agent-tools.md.
- A8 P1-4: verbatim recency-only ordering — by-design (citation use
  case prioritizes "what was said most recently").
- A10 P1-01: lcm_expand 24-day legacy timeout — sub-agent-only path,
  bounded by grant TTL.
- A10 P1-06: runExpand `?? 0` fallthrough — multi-conv grant path
  not exercised by lcm_expand_query (always single-conv).
- Various P2/P3 cosmetic items.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 6, 2026
…urate stats + maintainer-checklist

The PR description had drifted from the actual state through 10 audit
waves and ~140 closed bugs. This rewrites it for a 10/10 maintainer
read.

# What's new

## Five mermaid diagrams

1. **Storage pyramid** — the lossless bedrock + condensed views + on-demand
   synthesis cache + async sidecars (entities + vec0)
2. **Tool routing** — 5 question types → 8 tools mapping with cost annotations
3. **Suppression cascade** — 10+ read paths filtering `suppressed_at IS NULL`
   with cascade triggers
4. **Synthesis dispatch** — per-tier model selection (haiku/sonnet/opus/
   thinking) with verify-fidelity + best-of-N branches
5. **Concurrency model** — gateway hot-path vs worker autostart vs
   lock semantics, with the §0 invariant called out

## Updated stats (from drift)

- Test count: 1323 → **1502 passing**
- Commits: 60+ → **77**
- Source LOC: **15,279 production**
- Test files: **31 v4.1-tagged tests** out of 105 total
- Audits: documents all **10 waves** including Wave-9 (78 findings) and
  Wave-10 (12-for-12 reviewer + 4 sub-agent + 1 fixture-circularity)
- TS errors: 0 PR-introduced (677 baseline matches main)

## New sections

- **Why Voyage embeddings** — Phase A spike data including the +52.5pp
  paraphrastic recall lift that justified the choice; rationale for
  voyage-4-large + rerank-2.5 over alternatives
- **Audit history** table — wave-by-wave with finding counts; shows the
  convergence trend and the Wave-9 → Wave-10 pivot from "more audits"
  to "automated invariant detection"
- **Test infrastructure** — 8 of 9 antipattern classes mapped to
  automated detection layers; cost profile per layer
- **Reviewer checklist** — 6 sections to focus on for merge approval,
  in priority order

## Restructured for navigation

- Top-of-page TL;DR with headline numbers
- Table of contents linking 16 sections
- Each tool gets a row in the cost-profile table
- Each cut feature gets a row with reason + Martian-Engineering#616 link

# Verification

- npx vitest run: 1502/1502 tests passing
- PR body on GitHub updated to match (gh pr edit --body-file)
- All mermaid diagrams render correctly in GitHub markdown preview
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 6, 2026
…ed 12/12 real)

Fresh re-audit at 37e2b71 found 12 issues; 11 closed in this commit, 1
documented as known limitation. Reviewer was 12-for-12 real (Wave-10
was also 12-for-12; reviewer track record: 24-for-24).

# CI blockers

- **#1 (P1)** Auth invariant test hardcoded `/tmp/lossless-claw-upstream`
  path. CI failed because that path doesn't exist on GitHub runners;
  local runs accidentally succeeded by reading whatever stale checkout
  was at that path. Now resolves via `import.meta.url` →
  `__dirname/../src/plugin/lcm-command.ts`. Works in any worktree.

- **#10 (P2)** `pnpm-lock.yaml` was stale after the Wave-10
  `optionalDependencies` addition. Regenerated via `pnpm install
  --lockfile-only`; verified `pnpm install --frozen-lockfile` succeeds.

# Security parity

- **#2 (P1)** `/lcm doctor apply` and `/lcm doctor clean apply` lacked
  `senderIsOwner` gate. Wave-9 Agent #10 had classified the doctor
  cases as READ_ONLY, but the `apply` flag inside dispatches to the
  summarizer (cost) AND mutates summaries (state) for `doctor apply`,
  and DELETEs cleaner matches for `doctor clean apply`. Mirror the
  purge / reconcile / worker-tick / eval gate pattern. Read-only
  variants (no `--apply`) stay open.

  Plus updated `test/lcm-command.test.ts`'s `createCommandContext`
  helper to default `senderIsOwner: true` so existing tests for the
  doctor mutating paths continue passing — Wave-9 negative tests
  still explicitly pass `senderIsOwner: false` via overrides.

  Plus added 4 new tests to `v41-authorization-invariants.test.ts`
  pinning the Wave-11 doctor-apply gate behavior (apply-rejected,
  read-only-allowed for both `doctor` and `doctor clean`).

- **#5 (P1)** `lcm_describe` early-budget-gate. The Wave-10 fix charged
  base summary tokens against the grant AFTER emitting `s.content`.
  For a sub-agent at zero remaining budget, the content was already
  disclosed before accounting could prevent it. Added an EARLY gate:
  if delegated session AND base summary tokens > remaining grant,
  redact `s.content` with a clear "[REDACTED — base summary content
  is N tokens but grant has only M remaining]" message and skip the
  charge. Closes the disclosure-before-accounting path.

# Correctness

- **#3 (P1)** Timezone fractional offsets + DST. Wave-10's "sample
  offset at noon" approach broke on:
    - Half-hour zones: Asia/Kolkata (UTC+5:30) → showed +5 not +5:30
    - Quarter-hour zones: Asia/Kathmandu (UTC+5:45)
    - DST transition days: LA spring-forward 2026-03-08 → noon is in
      PDT (-7) but local midnight was in PST (-8); my function used
      the noon offset for the whole day → wrong by 1 hour
  Replaced with iterative converge-to-midnight algorithm:
    1. Format `at` in target tz to get y/m/d
    2. Probe = naive `Date.UTC(y, m-1, d, 0, 0, 0)`
    3. Format probe in target tz; compute delta from target midnight
    4. Adjust probe; repeat until delta=0 (typically 1-2 iters)
  Handles all IANA timezones, DST transitions, and arbitrary offsets.

  Added 3 new regression tests:
    - Asia/Kolkata 'yesterday' (UTC+5:30) — half-hour offset
    - Asia/Kathmandu 'today' (UTC+5:45) — quarter-hour offset
    - America/Los_Angeles 2026-03-08 — spring-forward day, asserting
      'today' duration is exactly 23h

- **#6 (P1)** Hybrid rerank now skips individually oversized
  candidates instead of bailing. Pre-fix: when the FIRST candidate
  exceeded the 510K-token (85% of 600K) rerank budget, the packer
  set `rerankPacked=[]` and broke out, disabling rerank for the
  whole result set. Now: oversized candidates are individually
  skipped (counted in `rerankPackSkippedOversized`) and packing
  continues with later candidates that fit. Result: a single huge
  FTS hit no longer takes down the whole rerank.

- **#7 (P1)** Voyage `output_dimension` not forwarded. Configurable
  embedding dimensions (`LCM_EMBEDDING_DIM=2048` registers a 2048-dim
  profile in `lcm_embedding_profile`) but `embedTexts()` never sent
  `output_dimension` to Voyage, so Voyage returned its default (1024).
  vec0 INSERT then failed with dim mismatch on the per-model table.
  Added `outputDimension?: number` to `VoyageEmbedOptions`; forwarded
  via backfill (`opts.voyageOutputDimension`) and semantic-search
  query embed (`active.dim`). Default unchanged (omit → Voyage 1024).

# Documentation accuracy

- **#4 (P1)** Synthesis dispatch model claim. Tool description said
  "per-tier dispatch (haiku/sonnet/opus/thinking)" but actual LLM call
  routes through the configured summarizer chain (which ignores
  `args.model`). Source code already had honest comment in
  `buildLlmCallFromSummarizer` ("the summarizer wrapper ignores the
  dispatch-supplied model"); the tool description and PR description
  overclaimed. Updated tool description to be accurate: dispatch
  records the per-tier model name in the audit table, but the
  actual LLM call uses the operator's configured summarizer chain.

# Polish

- **#9 (P2)** Health archive filter. `readActiveProfile` selected on
  `active = 1` alone, ignoring `archive_after IS NOT NULL`. Semantic
  retrieval correctly filters archived; health was reporting a
  profile semantic search would not actually use during model cutover.
  Now matches: `WHERE active = 1 AND archive_after IS NULL`.

- **#11 (P2)** Changeset rewritten. Old changeset only mentioned
  session-family recall. New changeset documents the full v4.1
  release surface: 8 agent tools (with new modes), 2 worker autostarts,
  9 operator commands (with owner-gating), schema changes, sqlite-vec
  optionalDependency, configuration env vars, and what was cut to Martian-Engineering#616.

- **#12 (P3)** Stale entity-search docblock. The header comment said
  "entities with all-suppressed mentions can still appear here";
  Wave-10 added the EXISTS guard so they no longer can. Updated
  comment to reflect the actual filter behavior.

# Known limitation (deferred)

- **#8 (P2)** Cache key still ignores resolved model. Adding `model_used`
  to the UNIQUE index doesn't help because model resolution is dynamic
  (the summarizer chain picks at call time, not before INSERT). The
  proper fix is invalidate-on-mismatch at cache-hit time, which is a
  larger refactor. Documented in the entry above + tracked for follow-up.

# Verification

- `npx vitest run`: **1513 / 1513 tests passing** (1502 → 1513;
  +11 new regression tests for Wave-11 fixes)
- `npx tsc --noEmit`: **677 errors** (still below 739 main baseline;
  no PR-introduced TS errors)
- `pnpm install --frozen-lockfile --ignore-scripts --lockfile-only`:
  **succeeds** (was failing pre-fix with ERR_PNPM_OUTDATED_LOCKFILE)
- Authorization invariant test: now resolves the source path relative
  to test file via `__dirname` — works in any checkout location
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant