Skip to content

Upgrade llama.cpp from b9022 to b9049#104

Merged
bernardladenthin merged 1 commit intomasterfrom
claude/update-b9049-compatibility-b5HeJ
May 7, 2026
Merged

Upgrade llama.cpp from b9022 to b9049#104
bernardladenthin merged 1 commit intomasterfrom
claude/update-b9049-compatibility-b5HeJ

Conversation

@bernardladenthin
Copy link
Copy Markdown
Owner

Summary

This PR upgrades the pinned llama.cpp dependency from version b9022 to b9049, incorporating upstream improvements and new features for KV cache state management, FWHT support, and backend initialization.

Key Changes

  • KV Cache State Management: New LLAMA_STATE_SEQ_FLAGS_ON_DEVICE flag enables on-device KV cache state save/restore without host round-trips. State data format now includes 4-byte magic header and seq_id, making saved state from b9022 incompatible with b9049+.

  • FWHT Support: New ggml_op_hint enum and ggml_mul_mat_set_hint() function added for Fast Walsh-Hadamard Transform support in graph operations.

  • Backend Initialization: llama_backend_init() now automatically calls ggml_backend_load_all() if no backends are registered, simplifying initialization flow.

  • Error Handling: Unsupported model architectures now throw std::runtime_error instead of calling GGML_ABORT, allowing graceful error handling by callers.

  • Speculative Decoding: Server context checkpoints now support on-device state flags for improved speculative decoding performance.

  • Dependencies: GGML version bumped from 0.10.2 to 0.11.0; cpp-httplib updated to 0.43.3 with recursion elimination and OOM-safe improvements.

Notes

No JNI layer call-site changes are required. The new on-device state features and FWHT hints are not currently used by the Java bindings but are available for future optimization.

https://claude.ai/code/session_01TZ2Gvm2dyeRVoy5dCzMXNn

Key changes in this range:
- New LLAMA_STATE_SEQ_FLAGS_ON_DEVICE flag for on-device KV cache save/restore
- State seq data format now prepends 4-byte magic + seq_id header (b9022 state data incompatible)
- ggml_op_hint enum + ggml_mul_mat_set_hint() for FWHT support
- llama_backend_init() auto-loads backends if none registered
- server_prompt_checkpoint_update() gained on_device parameter
- GGML version 0.10.2 → 0.11.0, cpp-httplib 0.43.2 → 0.43.3

https://claude.ai/code/session_01TZ2Gvm2dyeRVoy5dCzMXNn
@bernardladenthin bernardladenthin merged commit c84ea9c into master May 7, 2026
16 checks passed
@bernardladenthin bernardladenthin deleted the claude/update-b9049-compatibility-b5HeJ branch May 7, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants