Skip to content

[BUG] Prompt cache regression in --print --resume since v2.1.69(?): cache_read never grows, ~20x cost increase #34629

@cinniezra

Description

@cinniezra

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Summary

--print --resume sessions stopped caching conversation turns between API calls starting around v2.1.69. Only Claude Code's internal system prompt (~14.5k tokens) is cached; all conversation history is cache_created from scratch on every message. This causes a ~20x cost increase per message compared to v2.1.68.

Environment

  • Platform: Ubuntu (Hetzner VPS)
  • Use case: Discord bot using claude --print --model <model> --resume <session-id> --output-format stream-json --verbose with prompts piped via stdin
  • Tested models: claude-opus-4-6[1m], opus, claude-opus-4-5-20251101

The regression is version-dependent, not model-dependent.

Suspect

Something in newer updates after 2.1.68 may have inadvertently broken cache breakpoint placement for --print --resume sessions.

Workaround

Pinned to v2.1.68 (npm install -g @anthropic-ai/claude-code@2.1.68).

What Should Happen?

Expected behavior (v2.1.68)

cache_read grows as conversation accumulates, cache_create drops to a small delta (~800 tokens):

Message 1: cache_read=13,997  cache_create=22,946  cost=$0.15  (cold start)
Message 2: cache_read=32,849  cache_create=4,636   cost=$0.05
Message 3: cache_read=36,846  cache_create=879     cost=$0.03
Message 4: cache_read=37,295  cache_create=802     cost=$0.02

Actual behavior (v2.1.76 and likely earlier versions after v2.1.68)

cache_read is stuck at ~14.5k (Claude Code's system prompt only), cache_create equals the full conversation size and grows every message:

Message 1: cache_read=14,569  cache_create=54,437  cost=$0.35
Message 2: cache_read=14,569  cache_create=55,084  cost=$0.35
Message 3: cache_read=14,569  cache_create=55,512  cost=$0.35
Message 4: cache_read=14,569  cache_create=55,733  cost=$0.36
Message 5: cache_read=14,569  cache_create=55,954  cost=$0.36

The conversation turns are never reused from cache between calls. Only Claude Code's internal system prompt (~14.5k tokens) caches successfully.

Error Messages/Logs

## Testing matrix

All tests used fresh session UUIDs and back-to-back messages (well within the 5-minute cache TTL):

| Version | Model | Context | cache_read grows? | Steady-state cost/msg |
|---------|-------|---------|-------------------|----------------------|
| 2.1.68 | `opus` | 200k | **Yes** | ~$0.02 |
| 2.1.68 | `claude-opus-4-6[1m]` | 1M | **Yes** | ~$0.02 |
| 2.1.76 | `opus` | 200k | **No (stuck at 14.5k)** | ~$0.04-0.40 (grows) |
| 2.1.76 | `claude-opus-4-6[1m]` | 1M | **No (stuck at 14.5k)** | ~$0.35-0.40 |
| 2.1.76 | `claude-opus-4-5-20251101` | 200k | **No (stuck at 14.5k)** | ~$0.04-0.40 (grows) |

Steps to Reproduce

Reproduction

  1. Run claude --print --resume <session-id> --output-format stream-json --verbose with a prompt via stdin
  2. Send 3+ messages to the same session
  3. Observe cache_read_input_tokens and cache_creation_input_tokens in the stream-json result output

Claude Model

Opus

Is this a regression?

Yes, this worked in a previous version

Last Working Version

2.1.68

Claude Code Version

2.1.76

Platform

Other

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Other

Additional Information

This report (including the testing matrix) was written by Claude Code during a debugging session.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions