ggml: Improve NVFP4 vecdot error by michaelw9999 · Pull Request #20435 · ggml-org/llama.cpp

michaelw9999 · 2026-03-12T00:05:29Z

This update modifies:
const uint8_t ue = ggml_fp32_to_ue4m3(amax / 6.0f);

That may not the best scale if you are considering all 16 weights in the subblock.
This check looks at some other nearby codes and calculates the difference to choose the best option.

This reduces the vecdot error from:

absolute quantization error:  0.002337
dot product error: 0.019774

to:

absolute quantization error: 0.002029
dot product error: 0.002411

This will help keep error down when using e2m1 x e2m1 on the GPU side or if a future quantizer gets implemented.

Copilot

Pull request overview

Improves NVFP4 scale selection in quantize_row_nvfp4_ref by searching nearby UE4M3 encodings to minimize sub-block quantization error, reducing downstream vecdot error.

Changes:

Replaces single-shot UE4M3 scale (amax / 6) with a small neighborhood search around the initial encoding.
Chooses the candidate scale that minimizes summed squared reconstruction error over the sub-block.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-12T00:10:51Z

ggml/src/ggml-quants.c

+            float lowest_err = INFINITY;
+            for (int difference = -2; difference <= 2; ++difference) {
+                const int candidate = (int) first_ue + difference;
+                if (candidate < 0 || candidate > 0x7E) {
+                    continue;
+                }
+                const float test_scale = ggml_ue4m3_to_fp32((uint8_t) candidate);
+                float test_scale_error = 0.0f;
+                for (int j = 0; j < qk_sub; ++j) {
+                    const int qi = best_index_mxfp4(xb[j], test_scale);
+                    const float err = xb[j] - kvalues_mxfp4[qi] * test_scale;
+                    test_scale_error += err * err;
+                }
+                if (test_scale_error < lowest_err) {
+                    lowest_err = test_scale_error;
+                    ue = (uint8_t) candidate;
+                }
+            }


This adds up to 5 full passes over qk_sub per sub-block (re-running best_index_mxfp4 and error accumulation each time), which can noticeably increase quantization CPU cost. Consider adding an early-exit when test_scale_error reaches 0 (or below a tiny epsilon), and/or tightening the candidate search adaptively (e.g., evaluate difference=0 first, then only expand outward while error improves) to reduce worst-case work while keeping the accuracy benefit.

Copilot · 2026-03-12T00:10:51Z

ggml/src/ggml-quants.c

+            for (int difference = -2; difference <= 2; ++difference) {
+                const int candidate = (int) first_ue + difference;
+                if (candidate < 0 || candidate > 0x7E) {


The search window (-2..2) and the upper bound (0x7E) are magic constants here. Please add a short comment or named constants explaining (1) why a ±2 neighborhood is sufficient, and (2) why 0x7E is the maximum valid finite UE4M3 code (and what 0x7F represents). This will make the intent easier to maintain and less error-prone if the encoding rules change.

Improve NVFP4 scale choice

c963a4f

michaelw9999 requested a review from ggerganov as a code owner March 12, 2026 00:05

Copilot AI review requested due to automatic review settings March 12, 2026 00:05

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 12, 2026

Copilot AI reviewed Mar 12, 2026

View reviewed changes

loci-dev mentioned this pull request Mar 12, 2026

UPSTREAM PR #20435: ggml: Improve NVFP4 vecdot error auroralabs-loci/llama.cpp#1249

Open

Copilot started reviewing on behalf of michaelw9999 March 13, 2026 05:47 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: Improve NVFP4 vecdot error#20435

ggml: Improve NVFP4 vecdot error#20435
michaelw9999 wants to merge 1 commit intoggml-org:masterfrom
michaelw9999:nvfp4-improve-vec-dot

michaelw9999 commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michaelw9999 commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants