Apple Silicon Metal + BLAS segfault for I2_S when ubatch >= 32 routes generic MUL_MAT

## Environment

- **Platform**: Apple Silicon Mac
- **Host**: Apple M4 Max
- **OS**: macOS
- **Compiler**: Homebrew clang 18.1.8
- **BitNet / submodule state**: BitNet using vendored `3rdparty/llama.cpp` at Eddie-Wang1120/llama.cpp commit `1f86f058de0c3f4098dedae2ae8653c335c868a1`
- **Model**: `microsoft/BitNet-b1.58-2B-4T-gguf` / `ggml-model-i2_s.gguf`
- **Build flags**:
  - `GGML_METAL=ON`
  - `GGML_ACCELERATE=ON`
  - `GGML_BLAS=ON`
  - `GGML_BLAS_VENDOR=Apple`
  - `BITNET_ARM_TL1=OFF`

## Problem

On Apple Silicon with Metal enabled, `i2_s` inference can segfault when BLAS is enabled and the physical micro-batch crosses the BLAS routing threshold.

The crash is tied to **physical `ubatch`**, not logical `batch`:

- `-b 2048 -ub 31` -> stable
- `-b 32 -ub 31` -> stable
- `-b 2048 -ub 32` -> segfault
- `-b 2048 -ub 512` -> segfault

This means the failure starts exactly when the BLAS backend begins claiming the generic `MUL_MAT` path for larger batches.

## Control Experiment

The same Metal runtime is stable when BLAS is disabled:

- BLAS **ON** + `-b 2048 -ub 512` -> segfault
- BLAS **OFF** + `-b 2048 -ub 512` -> stable

This strongly suggests the crash is in the BLAS-side handling of `GGML_TYPE_I2_S`, not in Metal itself and not in the outer chat request schema.

## Root Cause

`ggml-blas.cpp` allows the generic BLAS `MUL_MAT` path to accept quantized source tensors when `ggml_get_type_traits(src0->type)->to_float != NULL`.

For `GGML_TYPE_I2_S`, that is not safe:

- `I2_S` stores an external scale outside the per-row payload
- the generic BLAS dequantize-to-float path assumes self-contained per-row data
- once `ubatch >= 32`, BLAS starts claiming `MUL_MAT`
- that eventually crashes in the `i2_s` dequant / BLAS matmul path

In crash reports, the top frames consistently land in:

- `dequantize_row_i2_s`
- `ggml_backend_blas_mul_mat`

## Proposed Fix

Reject `GGML_TYPE_I2_S` in the generic BLAS `MUL_MAT` support check so that `I2_S` continues using its specialized non-BLAS path:

```cpp
return src0->type != GGML_TYPE_I2_S &&
       ggml_is_contiguous(src0) &&
       ggml_is_contiguous(src1) &&
       src1->type == GGML_TYPE_F32 &&
       (ne0 >= min_batch && ne1 >= min_batch && ne10 >= min_batch) &&
       (src0->type == GGML_TYPE_F32 || ggml_get_type_traits(src0->type)->to_float != NULL);
```

## Result After Patch

After applying the BLAS guard above:

- BLAS **ON** + Metal + `-b 2048 -ub 512` is stable
- managed broker end-to-end requests no longer segfault under the same settings

This does **not** solve all `i2_s` quality issues, but it does remove the native crash path.

## Related Issues

- #468
- #470
- #195
- #411


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple Silicon Metal + BLAS segfault for I2_S when ubatch >= 32 routes generic MUL_MAT #512

Environment

Problem

Control Experiment

Root Cause

Proposed Fix

Result After Patch

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Apple Silicon Metal + BLAS segfault for I2_S when ubatch >= 32 routes generic MUL_MAT #512

Description

Environment

Problem

Control Experiment

Root Cause

Proposed Fix

Result After Patch

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions