llama: fix llama-model-saver#20503
Draft
JohannesGaessler wants to merge 2 commits intoggml-org:masterfrom
Draft
Conversation
Member
|
It would be useful to have a simple little CI that checks that KV values in |
Contributor
Author
|
I agree. I'm thinking it would make sense to implement a roundtrip like manual GGUF context -> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes
llama-model-saverand makes the--outputargument oftest-llama-archsfunctional (the models themselves are still broken though because they lack tokenizers).The first issue fixed in this PR is that
llama-model-saveris simply unmaintained: a lot of new KV values were added since I implemented it and those were not being saved correctly. I simply went through the KV values again, added the missing ones and checked where the corresponding information can be extracted from.The second issue fixed in this PR is that on master several archs have broken tensor names: typically what happens is that in
llama_model::load_tensorstensors are being created without a corresponding entry inllm_get_tensor_names. As a consequenceLLM_TN_IMPL::strthen doesn't use the provided arguments to format the tensor name with e.g. the layer index. So you end up with multiple, different tensors that have names likeblk.%d.attn_q. Since a GGUF context is populated by tensor name this leads to conflicts and the model cannot be saved correctly. To me it is now clear why we havellm_get_tensor_namesin the first place. I think it would make more sense to just check inLLM_TN_IMPL::str()whethersuffix,bid, and/orxidare set and to use them in those cases. Also add a warning in cases where the tensor name template and the provided arguments don't match. I would implement this refactor in this PR.