[bd-kzw] Centralize SQLite storage to ~/.agent-relay + add session/summary tracking#7
Conversation
…mmary tracking - Move SQLite storage from /tmp to ~/.agent-relay (fixes data loss on reboot) - Support XDG_DATA_HOME and AGENT_RELAY_DATA_DIR env overrides - Add sessions table to track agent connection history - Add agent_summaries table for running agent context - Record session start/end in daemon on connect/disconnect - Add [[SUMMARY]] block parsing in wrapper for agent summaries - Update dashboard API to include sessions and summaries data - Dashboard now shows recent sessions and agent summaries Closes: agent-relay-kzw (SQLite in /tmp cleared on reboot)
- Add [[SESSION_END]]...[[/SESSION_END]] parsing in parser.ts
- Add closedBy field to StoredSession ('agent' | 'disconnect' | 'error')
- Update endSession() to accept closedBy option
- Add parseSessionEndAndClose() to tmux-wrapper
- Update daemon to pass closedBy on disconnect/error
- Include SESSION_END in startup instructions for agents
- Move StoredSession, SessionQuery, AgentSummary types to base adapter.ts - Add optional session/summary methods to StorageAdapter interface - This enables future MySQL/Postgres implementations - Add 12 tests for session management (start, end, query, closedBy) - Add 4 tests for agent summaries (save, get, getAll, update) - Add 3 tests for getMessageById - Add 11 tests for parseSummaryFromOutput and parseSessionEndFromOutput - Coverage: sqlite-adapter 43%→86%, parser 90%→97%
- Add currentSessionId getter to RelayClient (sessionId was private) - Update TmuxWrapper to use client.currentSessionId - Remove unused ParsedSummary import
There was a problem hiding this comment.
Pull request overview
This PR centralizes SQLite storage from /tmp to ~/.agent-relay to prevent data loss on system reboots, while adding session and summary tracking capabilities for agent connections.
Key Changes:
- Relocated SQLite storage with XDG_DATA_HOME and AGENT_RELAY_DATA_DIR environment variable support
- Introduced sessions and agent_summaries tables for tracking agent connection history and context
- Added [[SUMMARY]] and [[SESSION_END]] block parsing for agents to maintain running context
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/project-namespace.ts | Implements getBaseDir() to support environment-based storage directory configuration |
| src/storage/adapter.ts | Defines StoredSession, SessionQuery, and AgentSummary interfaces for new tracking features |
| src/storage/sqlite-adapter.ts | Adds sessions and agent_summaries tables with CRUD operations |
| src/storage/sqlite-adapter.test.ts | Provides comprehensive test coverage for session management and agent summary features |
| src/wrapper/parser.ts | Implements parseSummaryFromOutput() and parseSessionEndFromOutput() functions |
| src/wrapper/parser.test.ts | Tests for the new summary and session-end parsing functions |
| src/wrapper/client.ts | Exposes currentSessionId getter for session tracking |
| src/wrapper/tmux-wrapper.ts | Integrates storage adapter, parses output blocks, and saves summaries/sessions |
| src/daemon/server.ts | Records session start/end events on agent connect/disconnect |
| src/dashboard/server.ts | Extends API to include sessions and summaries data with duration formatting |
| src/cli/index.ts | Removes unused DEFAULT_SOCKET_PATH import |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| this.storage.init().catch(err => { | ||
| this.logStderr(`Failed to initialize storage: ${err.message}`, true); | ||
| this.storage = undefined; | ||
| }); |
There was a problem hiding this comment.
Storage initialization is asynchronous but not awaited. Methods like parseSummaryAndSave() (line 632) may attempt to use this.storage before initialization completes, leading to race conditions. Consider using a promise/flag pattern to ensure storage is ready before use, or await initialization in an async setup method.
|
|
||
| private sessionEndProcessed = false; // Track if we've already processed session end | ||
|
|
||
| /** |
There was a problem hiding this comment.
The sessionEndProcessed flag is never reset. If the wrapper instance is reused or lives across multiple sessions, this flag would prevent processing [[SESSION_END]] blocks in subsequent sessions. Consider resetting this flag when starting a new session or tying it to the session ID.
| /** | |
| /** | |
| * Reset per-session state, allowing this wrapper instance to process | |
| * [[SESSION_END]] blocks for a new logical session. | |
| * | |
| * Call this when starting a new session on a reused TmuxWrapper instance. | |
| */ | |
| public resetSessionState(): void { | |
| this.sessionEndProcessed = false; | |
| } | |
| /** |
| const sessionId = this.client.currentSessionId; | ||
| if (!sessionId) { | ||
| this.logStderr('Cannot close session: no session ID'); | ||
| return; | ||
| } |
There was a problem hiding this comment.
The code logs "Cannot close session: no session ID" but doesn't set sessionEndProcessed = false to allow retry. If currentSessionId becomes available later, the session end will never be processed. Consider handling this case differently or resetting the flag.
| if (!this.db) { | ||
| throw new Error('SqliteStorageAdapter not initialized'); | ||
| } | ||
|
|
There was a problem hiding this comment.
The COALESCE(?, summary) pattern means if a summary was set during startSession(), a null summary in endSession() won't clear it. This may be intentional, but could cause confusion if an explicit null is meant to clear the summary. Consider documenting this behavior or using a different pattern if clearing should be supported.
| // Note: The COALESCE(?, summary) pattern intentionally preserves any existing | |
| // session summary that was set earlier (for example, in startSession()) when | |
| // endSession() is called without a new summary. Because options?.summary is | |
| // coerced to null when undefined, passing no summary (or null) here will not | |
| // clear an existing summary; it will simply leave the current value unchanged. |
| duration: formatDuration(s.startedAt, s.endedAt), | ||
| messageCount: s.messageCount, | ||
| summary: s.summary, | ||
| isActive: !s.endedAt, // Active if no end time |
There was a problem hiding this comment.
The comment "Active if no end time" describes the implementation but doesn't explain that this doesn't account for closedBy values. A session could have endedAt set but still be considered inactive if it hasn't been explicitly closed. Consider clarifying the semantics of "active" in relation to both fields.
| isActive: !s.endedAt, // Active if no end time | |
| isActive: !s.endedAt, // Considered "active" solely when there is no end time; closedBy is not taken into account here |
- Fix storage initialization race condition: add storageReady promise that methods await before using storage - Add resetSessionState() method for wrapper reuse across sessions - Don't set sessionEndProcessed flag if sessionId unavailable (allows retry) - Add documentation for COALESCE pattern in endSession() explaining that null summary preserves existing value - Clarify isActive field documentation: determined by endedAt only, independent of closedBy field - Add closedBy field to SessionInfo interface and API response
…sqlite-centralized-summaries-Da0SY
… fix 1. Fix Gemini shell keyword execution by wrapping injected messages in backticks. 2. Deduplicate SUMMARY parsing to prevent error log spam for invalid JSON. 3. Increase tmux status-left-length to prevent agent name truncation.
Prevents messages from being injected when Gemini is at a shell prompt ($) to avoid unintended command execution.
…tart Three bug fixes reported during MCP testing: 1. **Spawn race condition fix** (Bug #10): Added spawningAgents mutex to prevent concurrent spawn requests for the same agent from both passing the activeWorkers.has() check before either completes. 2. **SIGKILL diagnostics** (Bug #7, #10): Added gatherSigkillDiagnostics() to capture memory usage, process count, and OOM killer messages when exit code 137 or SIGKILL is detected. This helps diagnose resource exhaustion issues. 3. **Orphan cleanup** (Bug #8, #9): Added cleanupOrphanedWorkers() that runs on spawner startup to kill relay-pty processes from a previous daemon run. This ensures a clean slate after daemon restarts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
template-resolver.ts: shell-escape interpolated variables (CRITICAL #1) broker_tests.rs: uncomment and wire up 5 real tests (CRITICAL #2) worker_tests.rs: uncomment and wire up 5 real tests (CRITICAL #3) worker.rs: log bypass-flag injection, add .. path traversal rejection (CRITICAL #4, #7) verification.ts: export stripInjectedTaskEcho, add path traversal guard (CRITICAL #5) runner.ts: remove duplicate stripInjectedTaskEcho, add ENV_ALLOWLIST filtering (HIGH #17) channel-messenger.ts: add secret scrubbing, hoist regex constants (MEDIUM #27, #28) process-spawner.ts: add settled guard for race condition (MEDIUM #23) step-executor.ts: add sideEffects to callback type, deprecate alias (HIGH #15, #16) index.ts: export StepExecutor directly (MEDIUM #29) workflows/refactor/*.ts: replace hardcoded paths, remove --no-verify (HIGH #8-11) broker.rs: move is_pid_alive to canonical location (HIGH #14) cost/tracker.ts: add restrictive file permissions (MEDIUM #30) cost/pricing.ts: add last-verified date (MEDIUM #31) verification.test.ts: 9 new tests for exported helpers (MEDIUM #32) Co-Authored-By: My Senior Dev <dev@myseniordev.com>
…#675) * refactor: TDD decomposition of runner.ts + main.rs with extracted modules Extracted 5 modules from runner.ts (6,878 lines): - verification.ts (143 lines) - template-resolver.ts (87 lines) - channel-messenger.ts (151 lines) - step-executor.ts (571 lines) - process-spawner.ts (96 lines) Added characterization tests for all extracted modules. Extracted broker.rs and worker.rs from main.rs. Bug fixes: - Restore stripInjectedTaskEcho in verification.ts - Guard agent.release() against broker 400 race condition - Fix run-summary-table test for new table format - Export normalizeModel for correct pricing resolution - Fix --wave argument parsing in run-refactor.ts - ESM imports in all workflow files * fix: address 10 review finding(s) tracker.ts: resolveModel now uses normalizeModel for alias resolution (pre-existing fix verified) run-refactor.ts: --wave parsing with proper validation (pre-existing fix verified) step-executor.ts: signal-killed processes now correctly treated as failures channel-messenger.ts: replaced ReDoS-vulnerable regex with iterative indexOf stripping runner.ts: eliminated shell injection by using direct git spawn with argument arrays process-spawner.ts: fixed SIGKILL fallback timer leak by storing and clearing reference Co-Authored-By: My Senior Dev <dev@myseniordev.com> * Revert "chore: gitignore .trajectories/ (automated run artifacts) (#676)" (#677) This reverts commit 07a8dc0. * refactor: TDD decomposition of runner.ts + main.rs with extracted modules Extracted 5 modules from runner.ts (6,878 lines): - verification.ts (143 lines) - template-resolver.ts (87 lines) - channel-messenger.ts (151 lines) - step-executor.ts (571 lines) - process-spawner.ts (96 lines) Added characterization tests for all extracted modules. Extracted broker.rs and worker.rs from main.rs. Bug fixes: - Restore stripInjectedTaskEcho in verification.ts - Guard agent.release() against broker 400 race condition - Fix run-summary-table test for new table format - Export normalizeModel for correct pricing resolution - Fix --wave argument parsing in run-refactor.ts - ESM imports in all workflow files * trajectories correction again * pre commit is executable * remove tracked workflows * fix: address 36 review findings across Rust and TypeScript modules template-resolver.ts: shell-escape interpolated variables (CRITICAL #1) broker_tests.rs: uncomment and wire up 5 real tests (CRITICAL #2) worker_tests.rs: uncomment and wire up 5 real tests (CRITICAL #3) worker.rs: log bypass-flag injection, add .. path traversal rejection (CRITICAL #4, #7) verification.ts: export stripInjectedTaskEcho, add path traversal guard (CRITICAL #5) runner.ts: remove duplicate stripInjectedTaskEcho, add ENV_ALLOWLIST filtering (HIGH #17) channel-messenger.ts: add secret scrubbing, hoist regex constants (MEDIUM #27, #28) process-spawner.ts: add settled guard for race condition (MEDIUM #23) step-executor.ts: add sideEffects to callback type, deprecate alias (HIGH #15, #16) index.ts: export StepExecutor directly (MEDIUM #29) workflows/refactor/*.ts: replace hardcoded paths, remove --no-verify (HIGH #8-11) broker.rs: move is_pid_alive to canonical location (HIGH #14) cost/tracker.ts: add restrictive file permissions (MEDIUM #30) cost/pricing.ts: add last-verified date (MEDIUM #31) verification.test.ts: 9 new tests for exported helpers (MEDIUM #32) Co-Authored-By: My Senior Dev <dev@myseniordev.com> * style: auto-format Rust code with cargo fmt * minor clean * fix: reinstate deleted workflow files into workflows/ci/ Moved fix-mcp-spawn.yaml, add-swift-sdk.ts, and cli-observability.ts into workflows/ci/ to clearly distinguish them as CI test suite workflows. Updated .gitignore to allow workflows/ci/ and workflows/refactor/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining Devin review findings and fix failing test - Fix tracker test: expect mode: 0o700 in mkdirSync assertion - Use Object.hasOwn() instead of `in` operator to avoid prototype chain false positives - Use Promise.allSettled to preserve partial output on process timeout - Apply path containment check for absolute paths in checkFileExists Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address new Devin review findings — StepExecutor name collision and cwd trailing slash - Rename StepExecutor interface in runner.ts to RunnerStepExecutor to avoid shadowing the StepExecutor class export in the barrel index - Normalize cwd with path.resolve() in checkFileExists to handle trailing slashes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes: agent-relay-kzw (SQLite in /tmp cleared on reboot)