Skip to content

metal : add NVFP4 quantization support#20456

Open
richarddd wants to merge 5 commits intoggml-org:masterfrom
richarddd:feat/nvfp4-metal
Open

metal : add NVFP4 quantization support#20456
richarddd wants to merge 5 commits intoggml-org:masterfrom
richarddd:feat/nvfp4-metal

Conversation

@richarddd
Copy link
Contributor

Basic Metal GPU support for NVFP4 (E2M1 + UE4M3 scales) quantization type with optimized mul_mv/mul_mm/get_rows kernels and precomputed scale LUTs, achieving ~11x prompt and ~25x generation speedup over CPU on Apple Silicon.

test CPU-only Metal speedup
pp512 44.75 t/s 489 t/s ~11x
tg128 1.03 t/s 25.3 t/s ~25x

@richarddd richarddd requested a review from ggerganov as a code owner March 12, 2026 13:44
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 12, 2026
Comment on lines +29 to +30
#define N_R0_NVFP4 2
#define N_SG_NVFP4 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to double check this on my Mac Studio to confirm no register spill (#20399)

@ggerganov ggerganov requested a review from a team March 13, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants