Add INT8 GEMM support to the GEMM operator by albiol2004 · Pull Request #94 · amd/IRON

albiol2004 · 2026-04-09T10:04:57Z

Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the
Python GEMM operator layer. The C++ kernels already had the templates and
compile flags, this connects them to the Python API.

Also fixes a pre-existing bug in get_arg_spec() where AIERuntimeArgSpec
defaulted all buffers to bfloat16, causing silent data corruption for any
non-bf16 output type.

Closes #93

Added

INT8 input support (dtype_in="i8") with i8, i16, i32 output types
INT8 MAC dimensions (8,8,8) for npu1/npu2 in microkernel_mac_dim_map
INT8 kernel compilation flags (-Di8_i32_ONLY, etc.)
INT8 golden reference with int32 accumulation in reference.py
5 INT8 test configurations (4/8 columns, all output types, row/col-major B)

Changed

get_arg_spec() now passes correct dtype to AIERuntimeArgSpec
Test params include dtype_in/dtype_out (existing bf16 tests unchanged)
bf16-specific flags (prio_accuracy, bfp16 emulation) skipped for INT8

Removed

Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the Python GEMM operator layer. Fix get_arg_spec() to pass correct dtype to AIERuntimeArgSpec (was defaulting to bfloat16 for all types). Closes issue amd#93

dtype_in and dtype_out have repr=False, so they are excluded from the auto-generated operator name. When a bf16 and an int8 GEMM share the same dimensions (M, K, N, tiles, columns), they produce identical xclbin filenames. The first to compile wins; the second silently reuses the wrong binary, producing garbage output. Override the name property to append the dtype suffix (e.g. _i8_i32) when dtype_in is not the default bf16. bf16 names are unchanged for backward compatibility.

hunhoffe · 2026-04-09T14:47:38Z

iron/operators/gemm/op.py

+        identical dimensions."""
+        base = super().name
+        if self.dtype_in != "bf16":
+            base += f"_{self.dtype_in}_{self.dtype_out}"


I think it might make sense to always change it to include dtype in/out... thoughts @andrej ?

iron/operators/gemm/op.py

from review feedback: replace GEMM's private _np_dtype_map with a shared np_dtype_map in test_utils.py, derived from the existing torch_dtype_map to stay in sync

design.py specified MAC dims (8,8,8) for NPU1 i8, but aie_kernels/aie2/mm.cc only provides matmul_vectorized_4x8x8_i8_* wrappers. The mismatch caused DMA stride/tile patterns to be shaped for an 8x8x8 MAC while the kernel consumed them as 4x8x8, link succeeded (symbol is matmul_i8_i32) but produced wrong results, surfacing as AssertionError in CI on Phoenix hardware. Also drop the duplicated "npu1" key in microkernel_mac_dim_map (the second entry silently overrode the first with identical values).

albiol2004 · 2026-04-13T10:10:02Z

Found the i8 GEMM failure: microkernel_mac_dim_map["npu1"]["i8"] was (8,8,8),
but aie_kernels/aie2/mm.cc only ships a 4x8x8 wrapper. Symbol name is just
matmul_i8_i32 regardless of MAC dims, so link succeeds, but design.py shapes
DMAs for (8,8,8) while the kernel reads them as (4,8,8), producing wrong
results. Doesn't reproduce on Strix because aie2p has the matching 8x8x8 wrapper.
Fixed in last commit; also dropped a duplicate "npu1" key.

Long-term suggestion: encode MAC dims in the kernel symbol
(matmul_i8_i32_4x8x8 instead of matmul_i8_i32). A Python/C++ drift would
then become a link error instead of silent numerical garbage. Happy to do this
in a follow-up.

Add INT8 GEMM support to the GEMM operator

24e6204

Wire up existing INT8 matmul kernels (i8→i8, i8→i16, i8→i32) through the Python GEMM operator layer. Fix get_arg_spec() to pass correct dtype to AIERuntimeArgSpec (was defaulting to bfloat16 for all types). Closes issue amd#93

albiol2004 requested review from andrej, hunhoffe and jgmelber as code owners April 9, 2026 10:04

hunhoffe reviewed Apr 9, 2026

View reviewed changes

iron/operators/gemm/op.py Outdated Show resolved Hide resolved

albiol2004 added 2 commits April 9, 2026 17:13

Move dtype map to shared np_dtype_map in test_utils

a2cb969

from review feedback: replace GEMM's private _np_dtype_map with a shared np_dtype_map in test_utils.py, derived from the existing torch_dtype_map to stay in sync

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add INT8 GEMM support to the GEMM operator#94

Add INT8 GEMM support to the GEMM operator#94
albiol2004 wants to merge 4 commits intoamd:develfrom
albiol2004:int8-gemm

albiol2004 commented Apr 9, 2026 •

edited

Loading

Uh oh!

hunhoffe Apr 9, 2026

Uh oh!

Uh oh!

albiol2004 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

albiol2004 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

Changed

Removed

Uh oh!

hunhoffe Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

albiol2004 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

albiol2004 commented Apr 9, 2026 •

edited

Loading