Preflight Checklist
What's Wrong?
Summary
--print --resume sessions stopped caching conversation turns between API calls starting around v2.1.69. Only Claude Code's internal system prompt (~14.5k tokens) is cached; all conversation history is cache_created from scratch on every message. This causes a ~20x cost increase per message compared to v2.1.68.
Environment
- Platform: Ubuntu (Hetzner VPS)
- Use case: Discord bot using
claude --print --model <model> --resume <session-id> --output-format stream-json --verbose with prompts piped via stdin
- Tested models:
claude-opus-4-6[1m], opus, claude-opus-4-5-20251101
The regression is version-dependent, not model-dependent.
Suspect
Something in newer updates after 2.1.68 may have inadvertently broken cache breakpoint placement for --print --resume sessions.
Workaround
Pinned to v2.1.68 (npm install -g @anthropic-ai/claude-code@2.1.68).
What Should Happen?
Expected behavior (v2.1.68)
cache_read grows as conversation accumulates, cache_create drops to a small delta (~800 tokens):
Message 1: cache_read=13,997 cache_create=22,946 cost=$0.15 (cold start)
Message 2: cache_read=32,849 cache_create=4,636 cost=$0.05
Message 3: cache_read=36,846 cache_create=879 cost=$0.03
Message 4: cache_read=37,295 cache_create=802 cost=$0.02
Actual behavior (v2.1.76 and likely earlier versions after v2.1.68)
cache_read is stuck at ~14.5k (Claude Code's system prompt only), cache_create equals the full conversation size and grows every message:
Message 1: cache_read=14,569 cache_create=54,437 cost=$0.35
Message 2: cache_read=14,569 cache_create=55,084 cost=$0.35
Message 3: cache_read=14,569 cache_create=55,512 cost=$0.35
Message 4: cache_read=14,569 cache_create=55,733 cost=$0.36
Message 5: cache_read=14,569 cache_create=55,954 cost=$0.36
The conversation turns are never reused from cache between calls. Only Claude Code's internal system prompt (~14.5k tokens) caches successfully.
Error Messages/Logs
## Testing matrix
All tests used fresh session UUIDs and back-to-back messages (well within the 5-minute cache TTL):
| Version | Model | Context | cache_read grows? | Steady-state cost/msg |
|---------|-------|---------|-------------------|----------------------|
| 2.1.68 | `opus` | 200k | **Yes** | ~$0.02 |
| 2.1.68 | `claude-opus-4-6[1m]` | 1M | **Yes** | ~$0.02 |
| 2.1.76 | `opus` | 200k | **No (stuck at 14.5k)** | ~$0.04-0.40 (grows) |
| 2.1.76 | `claude-opus-4-6[1m]` | 1M | **No (stuck at 14.5k)** | ~$0.35-0.40 |
| 2.1.76 | `claude-opus-4-5-20251101` | 200k | **No (stuck at 14.5k)** | ~$0.04-0.40 (grows) |
Steps to Reproduce
Reproduction
- Run
claude --print --resume <session-id> --output-format stream-json --verbose with a prompt via stdin
- Send 3+ messages to the same session
- Observe
cache_read_input_tokens and cache_creation_input_tokens in the stream-json result output
Claude Model
Opus
Is this a regression?
Yes, this worked in a previous version
Last Working Version
2.1.68
Claude Code Version
2.1.76
Platform
Other
Operating System
Ubuntu/Debian Linux
Terminal/Shell
Other
Additional Information
This report (including the testing matrix) was written by Claude Code during a debugging session.
Preflight Checklist
What's Wrong?
Summary
--print --resumesessions stopped caching conversation turns between API calls starting around v2.1.69. Only Claude Code's internal system prompt (~14.5k tokens) is cached; all conversation history iscache_created from scratch on every message. This causes a ~20x cost increase per message compared to v2.1.68.Environment
claude --print --model <model> --resume <session-id> --output-format stream-json --verbosewith prompts piped via stdinclaude-opus-4-6[1m],opus,claude-opus-4-5-20251101The regression is version-dependent, not model-dependent.
Suspect
Something in newer updates after 2.1.68 may have inadvertently broken cache breakpoint placement for
--print --resumesessions.Workaround
Pinned to v2.1.68 (
npm install -g @anthropic-ai/claude-code@2.1.68).What Should Happen?
Expected behavior (v2.1.68)
cache_readgrows as conversation accumulates,cache_createdrops to a small delta (~800 tokens):Actual behavior (v2.1.76 and likely earlier versions after v2.1.68)
cache_readis stuck at ~14.5k (Claude Code's system prompt only),cache_createequals the full conversation size and grows every message:The conversation turns are never reused from cache between calls. Only Claude Code's internal system prompt (~14.5k tokens) caches successfully.
Error Messages/Logs
Steps to Reproduce
Reproduction
claude --print --resume <session-id> --output-format stream-json --verbosewith a prompt via stdincache_read_input_tokensandcache_creation_input_tokensin the stream-jsonresultoutputClaude Model
Opus
Is this a regression?
Yes, this worked in a previous version
Last Working Version
2.1.68
Claude Code Version
2.1.76
Platform
Other
Operating System
Ubuntu/Debian Linux
Terminal/Shell
Other
Additional Information
This report (including the testing matrix) was written by Claude Code during a debugging session.