[MLAS][Kleidiai] Sve Gemm and IMatmul Integration by JonathanC-ARM · Pull Request #27643 · microsoft/onnxruntime

JonathanC-ARM · 2026-03-13T15:21:00Z

Description

Adds initial Arm SVE enablement for the KleidiAI MLAS backend, including SVE
ukernel wiring, SGEMM dispatch/packing support, and an SVE convolution path
with runtime selection (prefer SME/SME2 when available, otherwise use SVE).
Also updates the KleidiAI dependency to a newer release to pick up the
required SVE kernels KleidiAI 1.22

Motivation and Context

Enables KleidiAI acceleration on Arm systems that expose SVE but not SME/SME2,
reducing fallback to the generic MLAS implementations and broadening hardware
coverage. This is an initial bring-up focused on correctness and integration,
with some configuration limitations (e.g., SVE SGEMM currently targets non-
transposed inputs; SVE conv has capability constraints).

Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com

Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com mlas: Correct checks for early-exit fast path in sgemm_kleidiai.cpp Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com mlas: update ApplyAlphaBeta2D comment to reflect new control flow Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com mlas: add test case for batched K==0 in sgemm_kleidiai.cpp Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com

Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>

hariharans29 · 2026-03-13T17:46:10Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-03-13T17:46:29Z

Azure Pipelines successfully started running 4 pipeline(s).

Copilot

Pull request overview

Adds initial Arm SVE enablement for the KleidiAI MLAS backend, wiring in SVE SGEMM packing/dispatch and an SVE IMATMUL-based convolution path with runtime selection (prefer SME/SME2 when available, otherwise SVE). Also bumps the KleidiAI dependency to a version that provides the required SVE kernels.

Changes:

Extend MLAS platform dispatch to select KleidiAI overrides on SVE-only CPUs (and include SME2 in SME selection).
Add SVE variants for KleidiAI SGEMM (pack + batched GEMM dispatch) and convolution (SVE IMATMUL indirection path).
Expand FGEMM unit test coverage for batched short-path shapes and non-trivial alpha/beta cases.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
onnxruntime/test/mlas/unittest/test_fgemm_fixture.h	Adds small batched FGEMM short-path coverage with varied alpha/beta.
onnxruntime/core/mlas/lib/sgemm.cpp	Removes TransA gating so KleidiAI override can decide support; minor cleanup.
onnxruntime/core/mlas/lib/platform.cpp	Enables KleidiAI override selection for SME2 and SVE-only runtime.
onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp	Adds SVE SGEMM packer + batched GEMM path; refactors alpha/beta application helpers.
onnxruntime/core/mlas/lib/kleidiai/mlasi_kleidiai.h	Introduces `UseSVE` runtime feature flag.
onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp	Adds SVE IMATMUL convolution path and runtime selection between SME/SME2 vs SVE.
onnxruntime/core/mlas/lib/kai_ukernel_interface.h	Adds SVE matmul/imatmul wrapper typedefs + getter declarations; small comment edits.
onnxruntime/core/mlas/lib/kai_ukernel_interface.cpp	Registers SVE kernels and implements SVE getter functions.
cmake/deps.txt	Updates KleidiAI dependency to v1.22.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp

+    // Match SME alpha/beta behavior: apply beta per-batch when alpha==0 or K==0.
+    if (Data->alpha == 0.0f || K == 0) {
+        if (BatchSize == 1) {
+            ApplyBetaToC(Data->C, Data->ldc, M, N, Data->beta);
+        } else {
+            for (size_t batch = 0; batch < BatchSize; ++batch) {
+                ApplyBetaToC(Data[batch].C, Data[batch].ldc, M, N, Data[batch].beta);
+            }
+        }
+        return true;
+    }


onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp

+            const size_t tile_elems = TileSizeM * TileSizeN;
+            g_kai_tls.output_tile.resize(tile_elems);
+            out_tile = g_kai_tls.output_tile.data();
+            out_row_stride_bytes = TileSizeN * sizeof(float);


onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp

 #include "kai/ukernels/matmul/pack/kai_rhs_pack_nxk_f32p2vlx1biasf32_f32_f32_sme.h"
+#include "kai/ukernels/matmul/pack/kai_rhs_pack_kxn_x32p4vlx1b_x32_x32_sve.h"


onnxruntime/core/mlas/lib/kai_ukernel_interface.h

+const KaiF32SveIMatmulKernel GetKleidiAISveImatmulUKernel();
+
+const KaiF32SveKernel GetKleidiAISveSGemmUKernel();


onnxruntime/core/mlas/lib/kai_ukernel_interface.h


 #include "kai/ukernels/matmul/matmul_clamp_f32_qai8dxp_qsi4c32p/kai_matmul_clamp_f32_qai8dxp_qsi4c32p_interface.h"

+// matmul inferfaces


Laan33 and others added 4 commits March 13, 2026 13:12

fix: Add ApplyBetaToC helper function and handle multiple batch sizes.

4913a2e

Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com

feat: Optimize ApplyAlphaBetaStrided for ARM NEON with vectorized path

8e6d5e4

Signed-off-by: Cathal Lawlor cathal.lawlor@arm.com

Added initial SVE Sgemm changes

2435a0b

Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>

hariharans29 requested a review from Copilot March 13, 2026 17:45

Copilot started reviewing on behalf of hariharans29 March 13, 2026 17:46 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Merge branch 'microsoft:main' into jonclo01/sve_kleidi_integration

f352bfd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLAS][Kleidiai] Sve Gemm and IMatmul Integration#27643

[MLAS][Kleidiai] Sve Gemm and IMatmul Integration#27643
JonathanC-ARM wants to merge 5 commits intomicrosoft:mainfrom
JonathanC-ARM:jonclo01/sve_kleidi_integration

JonathanC-ARM commented Mar 13, 2026

Uh oh!

hariharans29 commented Mar 13, 2026

Uh oh!

azure-pipelines bot commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		#include "kai/ukernels/matmul/pack/kai_rhs_pack_nxk_f32p2vlx1biasf32_f32_f32_sme.h"
		#include "kai/ukernels/matmul/pack/kai_rhs_pack_kxn_x32p4vlx1b_x32_x32_sve.h"

		const KaiF32SveIMatmulKernel GetKleidiAISveImatmulUKernel();

		const KaiF32SveKernel GetKleidiAISveSGemmUKernel();


		#include "kai/ukernels/matmul/matmul_clamp_f32_qai8dxp_qsi4c32p/kai_matmul_clamp_f32_qai8dxp_qsi4c32p_interface.h"

		// matmul inferfaces

Conversation

JonathanC-ARM commented Mar 13, 2026

Description

Motivation and Context

Uh oh!

hariharans29 commented Mar 13, 2026

Uh oh!

azure-pipelines bot commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants