Skip to content

Apple Silicon Metal + BLAS segfault for I2_S when ubatch >= 32 routes generic MUL_MAT #512

@ckreager

Description

@ckreager

Environment

  • Platform: Apple Silicon Mac
  • Host: Apple M4 Max
  • OS: macOS
  • Compiler: Homebrew clang 18.1.8
  • BitNet / submodule state: BitNet using vendored 3rdparty/llama.cpp at Eddie-Wang1120/llama.cpp commit 1f86f058de0c3f4098dedae2ae8653c335c868a1
  • Model: microsoft/BitNet-b1.58-2B-4T-gguf / ggml-model-i2_s.gguf
  • Build flags:
    • GGML_METAL=ON
    • GGML_ACCELERATE=ON
    • GGML_BLAS=ON
    • GGML_BLAS_VENDOR=Apple
    • BITNET_ARM_TL1=OFF

Problem

On Apple Silicon with Metal enabled, i2_s inference can segfault when BLAS is enabled and the physical micro-batch crosses the BLAS routing threshold.

The crash is tied to physical ubatch, not logical batch:

  • -b 2048 -ub 31 -> stable
  • -b 32 -ub 31 -> stable
  • -b 2048 -ub 32 -> segfault
  • -b 2048 -ub 512 -> segfault

This means the failure starts exactly when the BLAS backend begins claiming the generic MUL_MAT path for larger batches.

Control Experiment

The same Metal runtime is stable when BLAS is disabled:

  • BLAS ON + -b 2048 -ub 512 -> segfault
  • BLAS OFF + -b 2048 -ub 512 -> stable

This strongly suggests the crash is in the BLAS-side handling of GGML_TYPE_I2_S, not in Metal itself and not in the outer chat request schema.

Root Cause

ggml-blas.cpp allows the generic BLAS MUL_MAT path to accept quantized source tensors when ggml_get_type_traits(src0->type)->to_float != NULL.

For GGML_TYPE_I2_S, that is not safe:

  • I2_S stores an external scale outside the per-row payload
  • the generic BLAS dequantize-to-float path assumes self-contained per-row data
  • once ubatch >= 32, BLAS starts claiming MUL_MAT
  • that eventually crashes in the i2_s dequant / BLAS matmul path

In crash reports, the top frames consistently land in:

  • dequantize_row_i2_s
  • ggml_backend_blas_mul_mat

Proposed Fix

Reject GGML_TYPE_I2_S in the generic BLAS MUL_MAT support check so that I2_S continues using its specialized non-BLAS path:

return src0->type != GGML_TYPE_I2_S &&
       ggml_is_contiguous(src0) &&
       ggml_is_contiguous(src1) &&
       src1->type == GGML_TYPE_F32 &&
       (ne0 >= min_batch && ne1 >= min_batch && ne10 >= min_batch) &&
       (src0->type == GGML_TYPE_F32 || ggml_get_type_traits(src0->type)->to_float != NULL);

Result After Patch

After applying the BLAS guard above:

  • BLAS ON + Metal + -b 2048 -ub 512 is stable
  • managed broker end-to-end requests no longer segfault under the same settings

This does not solve all i2_s quality issues, but it does remove the native crash path.

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions