fix(stainless): handle [DONE] SSE terminator in streaming responses#5012
Open
dtmeadows wants to merge 2 commits intollamastack:mainfrom
Open
fix(stainless): handle [DONE] SSE terminator in streaming responses#5012dtmeadows wants to merge 2 commits intollamastack:mainfrom
dtmeadows wants to merge 2 commits intollamastack:mainfrom
Conversation
Adds a top-level streaming.on_event config that gracefully handles the OpenAI-standard data: [DONE] stream terminator. Without this, SDKs crash when consuming vLLM or other OpenAI-compatible backends that emit [DONE] at the end of SSE streams, since clients try to JSON-parse the sentinel as a regular event chunk. Fixes: llamastack#4744
Contributor
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ llama-stack-client-node studio · code · diff
⏳ llama-stack-client-go studio · code · diff
✅ llama-stack-client-python studio · code · diff
⏳ These are partial results; builds are still running. This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
mattf
approved these changes
Feb 27, 2026
Collaborator
mattf
left a comment
There was a problem hiding this comment.
before -
$ uv run --with llama-stack-client python -c 'from llama_stack_client import LlamaStackClient; client = LlamaStackClient(base_url="http://localhost:8000"); print([chunk for chunk in client.chat.completions.create(model="Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "hello"}], max_tokens=5, stream=True)])'
INFO:httpx:HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 200 OK"
...
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
after -
$ uv run --with https://pkg.stainless.com/s/llama-stack-client-python/e8ba87ec863e1b417327c14b112fbd8622505dbc/llama_stack_client-0.5.0a2-py3-none-any.whl python -c 'from llama_stack_client import LlamaStackClient; client = LlamaStackClient(base_url="http://localhost:8000"); print([chunk for chunk in client.chat.completions.create(model="Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "hello"}], max_tokens=5, stream=True)])'
...
[ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), index=0, finish_reason=None, logprobs=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None, prompt_token_ids=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning='\n'), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning='Okay'), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning=','), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning=' the'), index=0, finish_reason='length', logprobs=None, stop_reason=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None)]
The config.yml is auto-generated from generate_config.py. Rather than editing config.yml directly, add STREAMING to the generator and wire it through StainlessConfig so the hook regenerates it correctly. This also fixes the yaml formatting from the previous commit (yaml.safe_dump uses single-quoted '[DONE]' and 2-space-indented list items).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Adds a top-level
streaming.on_eventconfig toclient-sdks/stainless/config.ymlthat gracefully handles the OpenAI-standarddata: [DONE]SSE stream terminator. Without this, the generated SDKs crash when consuming vLLM or any other OpenAI-compatible backend, because they attempt to JSON-parse the[DONE]sentinel as if it were a regular event chunk — producing errors likeCould not parse message into JSON: [DONE].The fix adds a priority-ordered event handler table. The first rule matches
data: [DONE]and signals a clean stream end (handle: done); the fallthrough rule yields all other events normally.Notes
Fixes #4744.