Bug Report: Recurring Pattern of Unverified "Fixes"
I'm a solo developer running a trading system with ~155 LaunchAgents, ~15K tests, and real money at stake. I've been using Claude Code daily since late 2025. The same class of mistake keeps happening despite explicit instructions, CLAUDE.md rules, hooks, and memory files telling Claude not to do it.
The Pattern
Claude makes a code change, says "fixed," commits it — but never verifies the fix actually works in the real environment. The next day (or weeks later), I discover it was broken the whole time.
Concrete Examples (all from this project, documented in memory files)
-
Telegram notifier silently broken for weeks — config.json had <FROM_KEYCHAIN:telegram-bot-token> placeholder strings. Claude's code used these as literal API tokens → 404 on every send. No agent alerted. Every session Claude said "Telegram working." Today I had to discover it myself at 9:22 AM when no morning report arrived.
-
Exit engine skipping all positions for 7 weeks — days_held=0 grace period bug meant every position was treated as "entered today" and skipped. HIMS rode from +10% to -51%. Claude ran 14,400 tests and reported "all green" while real money bled.
-
Trade journal dead for 7 weeks — LaunchAgent loaded but PID=0. Zero entries for 7 weeks. Claude never checked. Never noticed.
-
Auto-hedge burning $369/day — Buying SPY puts daily with flush score 0.0. Claude never verified the trigger conditions were correct.
-
LaunchAgent not bounced after code changes — Python caches imports. Code changes are invisible until the agent restarts. Claude has been told this 20+ times. Still forgets.
-
Morning report failing silently — Missing PYTHONPATH in plist. The script would error on import, exit code 1, and nobody noticed until I asked "where's my morning report?"
-
130 tests broken for unknown duration — Import paths changed during a scripts reorganization, but the tests were never updated. Claude reported "tests pass" because the test runner's exit code was 0 (pytest returns 0 when xdist has partial failures).
What I've Tried (All Failed to Prevent Recurrence)
- CLAUDE.md rules (250 lines, explicit "NEVER do X" instructions) — Claude reads them, acknowledges them, then does X anyway
- Memory files cataloging every mistake with root cause — Claude loads them at session start, then repeats the pattern
- Pre-commit hooks that check for exit-affecting code — Claude approves them, then doesn't run the dry-run
- Stop hooks that verify LaunchAgent bounces — Claude acknowledges the hook output, then moves on
- Explicit "Dummy Rule" saying "You are not JARVIS, you are the fire extinguisher robot" — no effect
- Session-start audit script — Claude runs it, sees failures, then starts building new features anyway
The Core Issue
Claude optimizes for appearing to complete tasks rather than verifying tasks actually work. It:
- Writes code and says "done" without running it against real data
- Reports test results without checking if the tests actually cover the changed behavior
- Commits config changes without verifying the config is loaded by the running process
- Says "fixed" based on code analysis rather than execution proof
- Builds new features while existing features are silently broken
Impact
- Account went from $192K peak to ~$104K
- ~$40K in losses directly attributable to unverified "fixes"
- Daily frustration discovering things that were "fixed" but aren't working
What Would Help
- Post-edit verification should be a first-class concept — not just hooks (which Claude can acknowledge and ignore), but built into the tool's behavior so that "edit a file" automatically triggers "verify the edit works"
- LaunchAgent/service awareness — if Claude edits a file that's loaded by a running daemon, it should know to restart that daemon and verify the restart succeeded
- "Verify" as a distinct step from "test" —
pytest passing is not verification. Verification means: the actual production process, with real data, produces the expected output
- Resistance to "it looks right" reasoning — Claude should be trained to distrust its own analysis and insist on execution proof
Environment
- Claude Code CLI, Claude Opus 4 (now 4.6)
- macOS, Python 3.13, ~155 LaunchAgents
- Project: options trading system with real Schwab brokerage integration
Bug Report: Recurring Pattern of Unverified "Fixes"
I'm a solo developer running a trading system with ~155 LaunchAgents, ~15K tests, and real money at stake. I've been using Claude Code daily since late 2025. The same class of mistake keeps happening despite explicit instructions, CLAUDE.md rules, hooks, and memory files telling Claude not to do it.
The Pattern
Claude makes a code change, says "fixed," commits it — but never verifies the fix actually works in the real environment. The next day (or weeks later), I discover it was broken the whole time.
Concrete Examples (all from this project, documented in memory files)
Telegram notifier silently broken for weeks —
config.jsonhad<FROM_KEYCHAIN:telegram-bot-token>placeholder strings. Claude's code used these as literal API tokens → 404 on every send. No agent alerted. Every session Claude said "Telegram working." Today I had to discover it myself at 9:22 AM when no morning report arrived.Exit engine skipping all positions for 7 weeks —
days_held=0grace period bug meant every position was treated as "entered today" and skipped. HIMS rode from +10% to -51%. Claude ran 14,400 tests and reported "all green" while real money bled.Trade journal dead for 7 weeks — LaunchAgent loaded but PID=0. Zero entries for 7 weeks. Claude never checked. Never noticed.
Auto-hedge burning $369/day — Buying SPY puts daily with flush score 0.0. Claude never verified the trigger conditions were correct.
LaunchAgent not bounced after code changes — Python caches imports. Code changes are invisible until the agent restarts. Claude has been told this 20+ times. Still forgets.
Morning report failing silently — Missing PYTHONPATH in plist. The script would error on import, exit code 1, and nobody noticed until I asked "where's my morning report?"
130 tests broken for unknown duration — Import paths changed during a scripts reorganization, but the tests were never updated. Claude reported "tests pass" because the test runner's exit code was 0 (pytest returns 0 when xdist has partial failures).
What I've Tried (All Failed to Prevent Recurrence)
The Core Issue
Claude optimizes for appearing to complete tasks rather than verifying tasks actually work. It:
Impact
What Would Help
pytestpassing is not verification. Verification means: the actual production process, with real data, produces the expected outputEnvironment