Skip to content

llama: fix llama-model-saver#20503

Draft
JohannesGaessler wants to merge 2 commits intoggml-org:masterfrom
JohannesGaessler:llama-fix-model-saver
Draft

llama: fix llama-model-saver#20503
JohannesGaessler wants to merge 2 commits intoggml-org:masterfrom
JohannesGaessler:llama-fix-model-saver

Conversation

@JohannesGaessler
Copy link
Contributor

This PR fixes llama-model-saver and makes the --output argument of test-llama-archs functional (the models themselves are still broken though because they lack tokenizers).

The first issue fixed in this PR is that llama-model-saver is simply unmaintained: a lot of new KV values were added since I implemented it and those were not being saved correctly. I simply went through the KV values again, added the missing ones and checked where the corresponding information can be extracted from.

The second issue fixed in this PR is that on master several archs have broken tensor names: typically what happens is that in llama_model::load_tensors tensors are being created without a corresponding entry in llm_get_tensor_names. As a consequence LLM_TN_IMPL::str then doesn't use the provided arguments to format the tensor name with e.g. the layer index. So you end up with multiple, different tensors that have names like blk.%d.attn_q. Since a GGUF context is populated by tensor name this leads to conflicts and the model cannot be saved correctly. To me it is now clear why we have llm_get_tensor_names in the first place. I think it would make more sense to just check in LLM_TN_IMPL::str() whether suffix, bid, and/or xid are set and to use them in those cases. Also add a warning in cases where the tensor name template and the provided arguments don't match. I would implement this refactor in this PR.

@github-actions github-actions bot added the testing Everything test related label Mar 13, 2026
@CISC
Copy link
Member

CISC commented Mar 13, 2026

It would be useful to have a simple little CI that checks that KV values in llama-arch.h are handled in llama-model-saver whenever updated. Perhaps also check gguf-py to ensure everything is in sync.

@JohannesGaessler
Copy link
Contributor Author

I agree. I'm thinking it would make sense to implement a roundtrip like manual GGUF context -> llama_model -> tmpfile -> llama_model in test-llama-archs. #20402 could be related, I haven't reviewed it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants