metal : add NVFP4 quantization support by richarddd · Pull Request #20456 · ggml-org/llama.cpp

richarddd · 2026-03-12T13:44:27Z

Basic Metal GPU support for NVFP4 (E2M1 + UE4M3 scales) quantization type with optimized mul_mv/mul_mm/get_rows kernels and precomputed scale LUTs, achieving ~11x prompt and ~25x generation speedup over CPU on Apple Silicon.

test	CPU-only	Metal	speedup
pp512	44.75 t/s	489 t/s	~11x
tg128	1.03 t/s	25.3 t/s	~25x

ggerganov · 2026-03-12T14:09:27Z

ggml/src/ggml-metal/ggml-metal-impl.h

+#define N_R0_NVFP4 2
+#define N_SG_NVFP4 2


Want to double check this on my Mac Studio to confirm no register spill (#20399)

richarddd added 2 commits March 12, 2026 13:21

NVFP4 metal

211ab16

computed LUT

6b37f2e

richarddd requested a review from ggerganov as a code owner March 12, 2026 13:44

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 12, 2026

richarddd added 2 commits March 12, 2026 14:47

format

d86bd9a

remove stale comment for old approach

2688a96

ggerganov approved these changes Mar 12, 2026

View reviewed changes

fix scale bug

3ed7d79

ggerganov requested a review from a team March 13, 2026 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : add NVFP4 quantization support#20456

metal : add NVFP4 quantization support#20456
richarddd wants to merge 5 commits intoggml-org:masterfrom
richarddd:feat/nvfp4-metal

richarddd commented Mar 12, 2026

Uh oh!

ggerganov Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		#define N_R0_NVFP4 2
		#define N_SG_NVFP4 2

Conversation

richarddd commented Mar 12, 2026

Uh oh!

ggerganov Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants