Skip to content

Guard against sumq2 being 0 in IQ4_NL resulting in nan values#20460

Open
bartowski1182 wants to merge 1 commit intoggml-org:masterfrom
bartowski1182:iq4_nl_fix
Open

Guard against sumq2 being 0 in IQ4_NL resulting in nan values#20460
bartowski1182 wants to merge 1 commit intoggml-org:masterfrom
bartowski1182:iq4_nl_fix

Conversation

@bartowski1182
Copy link
Contributor

With IQ4_NL on several recent models there have been issues where during quantization NaN blocks are being found which crashes the quant

It seems to be stemming from a scenario where sumq2 is 0 for a given block, likely from not having imatrix data for some obscure expert, or the weights themselves being 0 as we've seen with some recent Qwen models

This change guards against dividing by 0, instead setting d to 0, which would then just set the block of weights to 0, which seems appropriate

Most recently observed in nvidia's Nemotron-3-Super-120B-A12B, blk.3.ffn_down_exps has a block that never sees imatrix data, and that results in sumq2 being 0

I added some debug information to look at what those weights are like (to ensure that setting them to 0 wouldn't present a new issue) and the amax at this time is in the order of 1e-11 so they're likely rounding down to 0 during imatrix calculations anyways

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 12, 2026
@bartowski1182 bartowski1182 marked this pull request as ready for review March 12, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants