Skip to content

[NEW AGENT] claude-code-audit-stack — 3 adversarial verification subagents (bot-deploy-verifier, claim-auditor, remote-agent-dispatcher) #519

@LaterKidsXD

Description

@LaterKidsXD

Preliminary Checks

  • Read Code of Conduct
  • Reviewed existing subagents — this collection is not a duplicate (no equivalent post-deploy verifier, quantitative claim auditor, or PID-correct remote agent dispatcher in the current set)
  • Proposal is for legitimate, constructive use cases only
  • I have domain expertise (3+ years building & operating production trading bots and quant research pipelines, where these specific failure modes occur)

Proposal scope

This is a proposal for a collection of three tightly-coupled subagents that together form an adversarial-verification stack, plus one PostToolUse hook. They are submitted as one collection because they share a unifying design principle (assume the agent above lied unless every check independently passes) and because at least one — claim-auditor — is auto-fired by a hook in the collection.

If maintainers prefer, this could be filed as 3 separate proposals; let me know which fits your queue better. The collection is already public, MIT-licensed, schema-aligned, and production-tested at https://github.com/LaterKidsXD/claude-code-audit-stack.

Proposed Agent Names

  1. bot-deploy-verifier
  2. claim-auditor
  3. remote-agent-dispatcher

Plus a PostToolUse hook (audit-on-report-write) that auto-fires claim-auditor on every *.report.md write.

Domain Expertise

The collection specializes in catching silent failures that other agents miss — failures where the upstream agent reports DONE or PASS but the actual end-state of the system is wrong:

  1. bot-deploy-verifier — systemd / journalctl / git-driven verification. Catches the silent-skip pattern where an agent edits a config file but never restarts the service (the file is correct on disk, the running process is on old in-memory config), plus accidental cascade restarts of BindsTo=/Requires= siblings.

  2. claim-auditor — adversarial quantitative review of Markdown reports. Catches probability stacking (1−(1−p)^N vs N×p), conditional vs marginal pass-rate confusion, percentage vs percentage-points mixups, best-of-N selection bias, bootstrap-with-replacement implications, sample-size red flags. Read-only (tools: Read).

  3. remote-agent-dispatcher — mechanical scp+spawn for autonomous Claude Code agents on remote hosts via SSH. Captures the actual claude binary PID via pgrep (not the bash-wrapper PID via $! — the common trap when wrapping over SSH).

Unique Value Proposition

Unlike the existing breadth-focused subagents in this collection (which excel at domain expertise: Go, security, frontend, DBA, etc.), this stack is narrow and adversarial-by-design. Each agent assumes the agent above it might have skipped a step, and re-checks from scratch via systemctl/journalctl/git/grep. They surface explicit BLOCKED (not DONE) when any gate fails — agents are trained to drive toward DONE, and that's exactly the failure mode you want a separate verification process to catch.

Specifically, no agent in the current set covers:

  • Post-deploy state verification of running systemd services (the deployment-engineer/devops-engineer agents do deploy planning, not adversarial post-deploy verification)
  • Quantitative auditing of Markdown reports with structured P1/P2/P3 severity
  • Remote agent spawning with PID-correct wrapper-aware capture

The "blocked, not done" distinction is the load-bearing design property — it allows these to compose with other agents without false positives propagating downstream.

Primary Use Cases

bot-deploy-verifier:

  1. Verify any systemctl restart actually landed (config edit went through, service reloaded, running journal shows new values)
  2. Catch accidental cascade restarts of unrelated services (e.g., BindsTo=/Requires= dependencies firing as side effects)
  3. Auto-rollback to a known-good config when verification fails (when caller provides backup_path)
  4. Refuse to declare a deploy DONE until all gates green; explicit BLOCKED otherwise

claim-auditor:

  1. Audit autonomous research output (eval reports, MC results, backtest summaries) before acting on probability/EV claims
  2. Surface decision-shaping math errors that human reviewers glide past on the 14th report of the day
  3. Run as a post-processing step on any agent-generated quantitative analysis
  4. Return structured JSON for CI integration / pipe-into-other-agents

remote-agent-dispatcher:

  1. Delegate tasks that exceed Claude Code's interactive timeout to a remote VPS
  2. Run long-running autonomous agents on a host different from the human
  3. Capture PID correctly across SSH+sudo+nohup+bash wrapping (where $! returns the wrapper, not the binary)
  4. Survive SSH disconnect via nohup + verified PID file

Required Tools & Capabilities

  • bot-deploy-verifiertools: Bash (executes systemctl, journalctl, git over SSH; reads journal output)
  • claim-auditortools: Read (read-only — never modifies the report it audits; this is a security property)
  • remote-agent-dispatchertools: Bash, Read (Bash for scp/ssh/install; Read for spec-header validation only)

No external network calls beyond what Bash allows the user to invoke. No hardcoded credentials, paths, or hostnames (all user-facing examples use <your-ssh-host> placeholders). No filesystem mutations by claim-auditor.

Example Interactions

Example 1 — bot-deploy-verifier catches a silent skip:

User: "Deploy the new trail-stop config to trading-bot.service"
Caller agent: "Edited config/live.yaml: trailing_stop_min_dist: 0. Restarted service. DONE."

bot-deploy-verifier (invoked after restart):
- service active: ✅
- config-drift gate: ❌ FAILED — journal shows trailing_stop_min_dist: 4 (old value)
- expected keys verified: 0/4
- untouched siblings: ✅

Returns: BLOCKED, NO ROLLBACK
Cause: service was restarted at 14:42 UTC but the journal shows it loaded the OLD config from before the edit (the agent edited the wrong file or the systemd unit references a different path).

Example 2 — claim-auditor catches probability stacking:

User: "Audit eval_results.md before I commit to the strategy"

claim-auditor (severity_floor: P2):

# Claim Audit — eval_results.md

## P1 (decision-shaping, must address)
| Quote | Why wrong | Correct number |
|---|---|---|
| "3 evals at 35.7% per eval = ~90% chance one passes" | P(≥1 of N) = 1−(1−p)^N, not N×p | 73.4%, not 90% |

## Summary
- 1 P1 error / 0 P2 / 0 P3
- Recommended action: Recompute eval-cost economics before committing capital

Example 3 — remote-agent-dispatcher captures correct PID:

User: "Dispatch HANDOFF_LONG_BACKTEST.md to my-vps"

remote-agent-dispatcher:
- spec validated locally (HANDOFF header present)
- scp + install on host: ✅
- spawn pattern executed (nohup + sudo -u daemon + sleep 18s)
- pgrep -f basename: 1 claude binary PID found, PID 18432
- PID file written: /opt/agent-work/HANDOFF_LONG_BACKTEST_agent.pid
- done-conditions verified (file exists, PID owned by daemon, alive ≥5s)

Returns: DISPATCHED as PID 18432 on my-vps
Caller schedules first heartbeat check in 15-30 min.

Your Expertise

Additional Context

If maintainers green-light this proposal, I'm happy to submit it as a polished PR following the plugins/<plugin-name>/ layout used by recent community plugins (e.g., signed-audit-trails PR #496, protect-mcp PR #503). All schema requirements (subagent frontmatter, tool restrictions, MIT license, no hardcoded paths) are already met in the upstream repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions