Skip to content

Upgrade llama.cpp from b9049 to b9071#112

Merged
bernardladenthin merged 1 commit intomainfrom
claude/update-b9071-compatibility-DZMHc
May 8, 2026
Merged

Upgrade llama.cpp from b9049 to b9071#112
bernardladenthin merged 1 commit intomainfrom
claude/update-b9071-compatibility-DZMHc

Conversation

@bernardladenthin
Copy link
Copy Markdown
Owner

Summary

This PR upgrades the pinned llama.cpp version from b9049 to b9071, incorporating upstream improvements in multimodal support, attention mechanisms, backend initialization, KV state management, and hardware acceleration.

Key Changes

  • Multimodal & Chat: Added contains_media() method to chat messages and improved JSON serialization for media-containing messages
  • Attention Mechanisms: Introduced LLM_KV_ATTENTION_VALUE_SCALE KV key and corresponding hparam field for MiMo-V2 attention value scaling
  • Backend Initialization: Fixed llama_supports_gpu_offload() and llama_supports_rpc() to auto-initialize backends if none are registered
  • KV State Management: Improved state_seq_set_data by removing overly-strict seq_id matching and optimizing tensor reallocation logic to check shape compatibility
  • MiMo-V2 Model: Extended with Multi-Token Prediction (MTP) layer support, fused wqkv projections, and attention value scaling
  • Hardware Acceleration:
    • Added SYCL implementations for CUMSUM, DIAG, FILL, SSM_SCAN, SOLVE_TRI operations
    • Optimized CUDA outer-product to use cublasSgemmStridedBatched for batched operations
  • Multimodal Tools: Added MiniCPM-V 4.6 support with new projector type and ViT merger graph
  • Server UI: Integrated LLM-based conversation title generation and CSS animation fixes

Notes

All changes are additive or internal improvements with no breaking changes to the Java bindings API.

https://claude.ai/code/session_01X1BGPBMMuKRUcvRs8Su9rL

No project code changes required. All b9049→b9071 changes are either
additive (new KV keys, hparam fields, SYCL/OpenCL ops, MiniCPM-V 4.6
multimodal support) or bug fixes (state_seq_set_data seq_id guard
removal, KV slot restorer shape check, backend auto-load in
llama_supports_gpu_offload/rpc). The upstream server files compiled into
jllama are unchanged.

https://claude.ai/code/session_01X1BGPBMMuKRUcvRs8Su9rL
@bernardladenthin bernardladenthin merged commit 3525a73 into main May 8, 2026
10 checks passed
@bernardladenthin bernardladenthin deleted the claude/update-b9071-compatibility-DZMHc branch May 8, 2026 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants