Preliminary Checks
Proposal scope
This is a proposal for a collection of three tightly-coupled subagents that together form an adversarial-verification stack, plus one PostToolUse hook. They are submitted as one collection because they share a unifying design principle (assume the agent above lied unless every check independently passes) and because at least one — claim-auditor — is auto-fired by a hook in the collection.
If maintainers prefer, this could be filed as 3 separate proposals; let me know which fits your queue better. The collection is already public, MIT-licensed, schema-aligned, and production-tested at https://github.com/LaterKidsXD/claude-code-audit-stack.
Proposed Agent Names
bot-deploy-verifier
claim-auditor
remote-agent-dispatcher
Plus a PostToolUse hook (audit-on-report-write) that auto-fires claim-auditor on every *.report.md write.
Domain Expertise
The collection specializes in catching silent failures that other agents miss — failures where the upstream agent reports DONE or PASS but the actual end-state of the system is wrong:
-
bot-deploy-verifier — systemd / journalctl / git-driven verification. Catches the silent-skip pattern where an agent edits a config file but never restarts the service (the file is correct on disk, the running process is on old in-memory config), plus accidental cascade restarts of BindsTo=/Requires= siblings.
-
claim-auditor — adversarial quantitative review of Markdown reports. Catches probability stacking (1−(1−p)^N vs N×p), conditional vs marginal pass-rate confusion, percentage vs percentage-points mixups, best-of-N selection bias, bootstrap-with-replacement implications, sample-size red flags. Read-only (tools: Read).
-
remote-agent-dispatcher — mechanical scp+spawn for autonomous Claude Code agents on remote hosts via SSH. Captures the actual claude binary PID via pgrep (not the bash-wrapper PID via $! — the common trap when wrapping over SSH).
Unique Value Proposition
Unlike the existing breadth-focused subagents in this collection (which excel at domain expertise: Go, security, frontend, DBA, etc.), this stack is narrow and adversarial-by-design. Each agent assumes the agent above it might have skipped a step, and re-checks from scratch via systemctl/journalctl/git/grep. They surface explicit BLOCKED (not DONE) when any gate fails — agents are trained to drive toward DONE, and that's exactly the failure mode you want a separate verification process to catch.
Specifically, no agent in the current set covers:
- Post-deploy state verification of running systemd services (the deployment-engineer/devops-engineer agents do deploy planning, not adversarial post-deploy verification)
- Quantitative auditing of Markdown reports with structured P1/P2/P3 severity
- Remote agent spawning with PID-correct wrapper-aware capture
The "blocked, not done" distinction is the load-bearing design property — it allows these to compose with other agents without false positives propagating downstream.
Primary Use Cases
bot-deploy-verifier:
- Verify any
systemctl restart actually landed (config edit went through, service reloaded, running journal shows new values)
- Catch accidental cascade restarts of unrelated services (e.g.,
BindsTo=/Requires= dependencies firing as side effects)
- Auto-rollback to a known-good config when verification fails (when caller provides
backup_path)
- Refuse to declare a deploy DONE until all gates green; explicit BLOCKED otherwise
claim-auditor:
- Audit autonomous research output (eval reports, MC results, backtest summaries) before acting on probability/EV claims
- Surface decision-shaping math errors that human reviewers glide past on the 14th report of the day
- Run as a post-processing step on any agent-generated quantitative analysis
- Return structured JSON for CI integration / pipe-into-other-agents
remote-agent-dispatcher:
- Delegate tasks that exceed Claude Code's interactive timeout to a remote VPS
- Run long-running autonomous agents on a host different from the human
- Capture PID correctly across SSH+sudo+nohup+bash wrapping (where
$! returns the wrapper, not the binary)
- Survive SSH disconnect via
nohup + verified PID file
Required Tools & Capabilities
bot-deploy-verifier — tools: Bash (executes systemctl, journalctl, git over SSH; reads journal output)
claim-auditor — tools: Read (read-only — never modifies the report it audits; this is a security property)
remote-agent-dispatcher — tools: Bash, Read (Bash for scp/ssh/install; Read for spec-header validation only)
No external network calls beyond what Bash allows the user to invoke. No hardcoded credentials, paths, or hostnames (all user-facing examples use <your-ssh-host> placeholders). No filesystem mutations by claim-auditor.
Example Interactions
Example 1 — bot-deploy-verifier catches a silent skip:
User: "Deploy the new trail-stop config to trading-bot.service"
Caller agent: "Edited config/live.yaml: trailing_stop_min_dist: 0. Restarted service. DONE."
bot-deploy-verifier (invoked after restart):
- service active: ✅
- config-drift gate: ❌ FAILED — journal shows trailing_stop_min_dist: 4 (old value)
- expected keys verified: 0/4
- untouched siblings: ✅
Returns: BLOCKED, NO ROLLBACK
Cause: service was restarted at 14:42 UTC but the journal shows it loaded the OLD config from before the edit (the agent edited the wrong file or the systemd unit references a different path).
Example 2 — claim-auditor catches probability stacking:
User: "Audit eval_results.md before I commit to the strategy"
claim-auditor (severity_floor: P2):
# Claim Audit — eval_results.md
## P1 (decision-shaping, must address)
| Quote | Why wrong | Correct number |
|---|---|---|
| "3 evals at 35.7% per eval = ~90% chance one passes" | P(≥1 of N) = 1−(1−p)^N, not N×p | 73.4%, not 90% |
## Summary
- 1 P1 error / 0 P2 / 0 P3
- Recommended action: Recompute eval-cost economics before committing capital
Example 3 — remote-agent-dispatcher captures correct PID:
User: "Dispatch HANDOFF_LONG_BACKTEST.md to my-vps"
remote-agent-dispatcher:
- spec validated locally (HANDOFF header present)
- scp + install on host: ✅
- spawn pattern executed (nohup + sudo -u daemon + sleep 18s)
- pgrep -f basename: 1 claude binary PID found, PID 18432
- PID file written: /opt/agent-work/HANDOFF_LONG_BACKTEST_agent.pid
- done-conditions verified (file exists, PID owned by daemon, alive ≥5s)
Returns: DISPATCHED as PID 18432 on my-vps
Caller schedules first heartbeat check in 15-30 min.
Your Expertise
Additional Context
If maintainers green-light this proposal, I'm happy to submit it as a polished PR following the plugins/<plugin-name>/ layout used by recent community plugins (e.g., signed-audit-trails PR #496, protect-mcp PR #503). All schema requirements (subagent frontmatter, tool restrictions, MIT license, no hardcoded paths) are already met in the upstream repo.
Preliminary Checks
Proposal scope
This is a proposal for a collection of three tightly-coupled subagents that together form an adversarial-verification stack, plus one PostToolUse hook. They are submitted as one collection because they share a unifying design principle (assume the agent above lied unless every check independently passes) and because at least one —
claim-auditor— is auto-fired by a hook in the collection.If maintainers prefer, this could be filed as 3 separate proposals; let me know which fits your queue better. The collection is already public, MIT-licensed, schema-aligned, and production-tested at https://github.com/LaterKidsXD/claude-code-audit-stack.
Proposed Agent Names
bot-deploy-verifierclaim-auditorremote-agent-dispatcherPlus a
PostToolUsehook (audit-on-report-write) that auto-firesclaim-auditoron every*.report.mdwrite.Domain Expertise
The collection specializes in catching silent failures that other agents miss — failures where the upstream agent reports
DONEorPASSbut the actual end-state of the system is wrong:bot-deploy-verifier— systemd / journalctl / git-driven verification. Catches the silent-skip pattern where an agent edits a config file but never restarts the service (the file is correct on disk, the running process is on old in-memory config), plus accidental cascade restarts ofBindsTo=/Requires=siblings.claim-auditor— adversarial quantitative review of Markdown reports. Catches probability stacking (1−(1−p)^NvsN×p), conditional vs marginal pass-rate confusion, percentage vs percentage-points mixups, best-of-N selection bias, bootstrap-with-replacement implications, sample-size red flags. Read-only (tools: Read).remote-agent-dispatcher— mechanical scp+spawn for autonomous Claude Code agents on remote hosts via SSH. Captures the actualclaudebinary PID viapgrep(not the bash-wrapper PID via$!— the common trap when wrapping over SSH).Unique Value Proposition
Unlike the existing breadth-focused subagents in this collection (which excel at domain expertise: Go, security, frontend, DBA, etc.), this stack is narrow and adversarial-by-design. Each agent assumes the agent above it might have skipped a step, and re-checks from scratch via
systemctl/journalctl/git/grep. They surface explicitBLOCKED(notDONE) when any gate fails — agents are trained to drive toward DONE, and that's exactly the failure mode you want a separate verification process to catch.Specifically, no agent in the current set covers:
The "blocked, not done" distinction is the load-bearing design property — it allows these to compose with other agents without false positives propagating downstream.
Primary Use Cases
bot-deploy-verifier:systemctl restartactually landed (config edit went through, service reloaded, running journal shows new values)BindsTo=/Requires=dependencies firing as side effects)backup_path)claim-auditor:remote-agent-dispatcher:$!returns the wrapper, not the binary)nohup+ verified PID fileRequired Tools & Capabilities
bot-deploy-verifier—tools: Bash(executes systemctl, journalctl, git over SSH; reads journal output)claim-auditor—tools: Read(read-only — never modifies the report it audits; this is a security property)remote-agent-dispatcher—tools: Bash, Read(Bash for scp/ssh/install; Read for spec-header validation only)No external network calls beyond what
Bashallows the user to invoke. No hardcoded credentials, paths, or hostnames (all user-facing examples use<your-ssh-host>placeholders). No filesystem mutations byclaim-auditor.Example Interactions
Example 1 — bot-deploy-verifier catches a silent skip:
Example 2 — claim-auditor catches probability stacking:
Example 3 — remote-agent-dispatcher captures correct PID:
Your Expertise
bot-deploy-verifier): https://github.com/LaterKidsXD/claude-code-audit-stack/blob/main/docs/silent-skip-incident.mdclaim-auditorcatching probability-stacking errors in autonomous research output: https://github.com/LaterKidsXD/claude-code-audit-stack/blob/main/reports/sample-findings.mdAdditional Context
claim-auditor: https://github.com/marketplace/actions/claim-auditorIf maintainers green-light this proposal, I'm happy to submit it as a polished PR following the
plugins/<plugin-name>/layout used by recent community plugins (e.g.,signed-audit-trailsPR #496,protect-mcpPR #503). All schema requirements (subagent frontmatter, tool restrictions, MIT license, no hardcoded paths) are already met in the upstream repo.