common/parser: add proper reasoning tag prefill reading#20424
common/parser: add proper reasoning tag prefill reading#20424pwilkin wants to merge 5 commits intoggml-org:masterfrom
Conversation
|
Dumb question, why not find the start of the assistant message and prepend that? I agree it would be easier to parse if we had a "prefill" of some sort that normalizes the input, such that we can handle the logic in the grammar and not through flags. However, if we're going this route I would look into prepending the start of the entire assistant message. This will also open the door for parsing output from requests with an assistant prefill. |
|
Yeah, that would be the logical conclusion, but for now it's easier for me just to extract the reasoning markers since finding the actual start of the assistant message is nontrivial. |
|
Qwen3.5 uses however, It probably doesn't matter for this model, but it is technically not adhering to the template. |
Maybe set |
Run the template once with |
|
That usually works, yeah 😀 I can try that and see what the results are (this is what |
|
Nice patch! With model https://huggingface.co/mradermacher/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking-GGUF the patches fix webui getting confused on /think and not splitting correctly reasoning and generation part. Build llama.cpp-cuda-git-b8334.r9.710878a7dd-1. |
3bfb08f to
4083259
Compare
|
@aldehir changed the prefill extraction behavior to the differential one you mentioned. |
| std::string grammar; | ||
| bool grammar_lazy = false; | ||
| bool thinking_forced_open = false; | ||
| std::string prefill; |
There was a problem hiding this comment.
Think we name this generation_prompt? It lines up with the add_generation_prompt flag.
This changes the erroneous behavior of the autoparser that ascribed thinking behavior to templates. As people rightly mentioned, some models have dynamic or hybrid reasoning - they can reason or not depending on some switches and even the template behavior can change due to this (i.e. inserting
<think>in assistant prefill after a "no_think" appears in a user message).Therefore, the
FORCED_OPENandFORCED_CLOSEDformats are gone. The parser will now just detect models with tagged reasoning, i.e. an opening and closing reasoning marker (deletedDELIMITERalso since it's a special case with the opening marker being empty). However, it will check the assistant prefill for those markers and will append them to the input for the grammar and the parser, so that they are taken into account, therefore just simplifying the parsing mechanism since it doesn't now have to differentiate whether the<think>' /` was added by the template or generated by the model.