ggml-cpu: Add get_rows support for Q6_K REPACK by Alcpz · Pull Request #20396 · ggml-org/llama.cpp

Alcpz · 2026-03-11T12:51:46Z

Tests with CPU_REPACK + models with tied embeddings result duplicated tensors, as CPU_REPACK currently doesn't support GET_ROWS.

This has high impact when trying to run small models on low memory devices like phones, E.g:

Model	Mem footprint	Duplicated tensor size
LFM2-350M-Q4_K_M	430 MiB	52 MiB
LFM2-1.2B-Q4_K_M	1380 MiB	105 MiB

The PR addresses this by adding generic support (only for Q6_K, as that's the standard for the output layer).

Creating as draft as #16743 provides the same work (and this PR is based on that PR essentially). Will try to see if the other PR can be merged in, if not, I'll add proper validation (perplexity, llama-bench).

ggml-cpu: Add get_rows support for Q6_K REPACK

3b281e4

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 11, 2026

Alcpz mentioned this pull request Mar 11, 2026

get_rows & dequantize function implementation for repacked weights of type q6_K (q6_Kx8) #16743

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cpu: Add get_rows support for Q6_K REPACK#20396

ggml-cpu: Add get_rows support for Q6_K REPACK#20396
Alcpz wants to merge 1 commit intoggml-org:masterfrom
Alcpz:Alcpz/fix/duplicated_embedding

Alcpz commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Alcpz commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alcpz commented Mar 11, 2026 •

edited

Loading