Skip to content

ggml-cpu: Add get_rows support for Q6_K REPACK#20396

Draft
Alcpz wants to merge 1 commit intoggml-org:masterfrom
Alcpz:Alcpz/fix/duplicated_embedding
Draft

ggml-cpu: Add get_rows support for Q6_K REPACK#20396
Alcpz wants to merge 1 commit intoggml-org:masterfrom
Alcpz:Alcpz/fix/duplicated_embedding

Conversation

@Alcpz
Copy link
Collaborator

@Alcpz Alcpz commented Mar 11, 2026

Tests with CPU_REPACK + models with tied embeddings result duplicated tensors, as CPU_REPACK currently doesn't support GET_ROWS.

This has high impact when trying to run small models on low memory devices like phones, E.g:

Model Mem footprint Duplicated tensor size
LFM2-350M-Q4_K_M 430 MiB 52 MiB
LFM2-1.2B-Q4_K_M 1380 MiB 105 MiB

The PR addresses this by adding generic support (only for Q6_K, as that's the standard for the output layer).

Creating as draft as #16743 provides the same work (and this PR is based on that PR essentially). Will try to see if the other PR can be merged in, if not, I'll add proper validation (perplexity, llama-bench).

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant