Skip to content

fix(stainless): handle [DONE] SSE terminator in streaming responses#5012

Open
dtmeadows wants to merge 2 commits intollamastack:mainfrom
stainless-sdks:fix/streaming-done-terminator
Open

fix(stainless): handle [DONE] SSE terminator in streaming responses#5012
dtmeadows wants to merge 2 commits intollamastack:mainfrom
stainless-sdks:fix/streaming-done-terminator

Conversation

@dtmeadows
Copy link

@dtmeadows dtmeadows commented Feb 27, 2026

Changes

Adds a top-level streaming.on_event config to client-sdks/stainless/config.yml that gracefully handles the OpenAI-standard data: [DONE] SSE stream terminator. Without this, the generated SDKs crash when consuming vLLM or any other OpenAI-compatible backend, because they attempt to JSON-parse the [DONE] sentinel as if it were a regular event chunk — producing errors like Could not parse message into JSON: [DONE].

The fix adds a priority-ordered event handler table. The first rule matches data: [DONE] and signals a clean stream end (handle: done); the fallthrough rule yields all other events normally.

streaming:
  on_event:
    - data_starts_with: "[DONE]"
      handle: done
    - kind: fallthrough
      handle: yield
      error_property: error

Notes

Fixes #4744.

Adds a top-level streaming.on_event config that gracefully handles the
OpenAI-standard data: [DONE] stream terminator. Without this, SDKs crash
when consuming vLLM or other OpenAI-compatible backends that emit [DONE]
at the end of SSE streams, since clients try to JSON-parse the sentinel
as a regular event chunk.

Fixes: llamastack#4744
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 27, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

fix(stainless): handle [DONE] SSE terminator in streaming responses

Edit this comment to update it. It will appear in the SDK's changelogs.

llama-stack-client-node studio · code · diff

Your SDK built successfully.
generate ⚠️build ✅lint ✅test ❗

npm install https://pkg.stainless.com/s/llama-stack-client-node/3eca3a1cc6115e23766991ad8775670a0386e942/dist.tar.gz
llama-stack-client-go studio · code · diff

generate ❗build ⏳lint ❗test ❗

go get github.com/stainless-sdks/llama-stack-client-go@67a4b2d546608ce3408df614e6113cdb131a448b
llama-stack-client-python studio · code · diff

Your SDK built successfully.
generate ⚠️build ✅lint ✅test ✅

pip install https://pkg.stainless.com/s/llama-stack-client-python/5ec4518716be86ac2444d480665068fb3b4b1118/llama_stack_client-0.5.0a2-py3-none-any.whl
New diagnostics (2 note)
💡 ReadmeExample/MissingParam: Example is missing required parameter `messages`.
💡 ReadmeExample/MissingParam: Example is missing required parameter `model`.
llama-stack-client-openapi studio · code · diff

Your SDK built successfully.
generate ⚠️

⏳ These are partial results; builds are still running.


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-02-27 19:25:11 UTC

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before -

$ uv run --with llama-stack-client python -c 'from llama_stack_client import LlamaStackClient; client = LlamaStackClient(base_url="http://localhost:8000"); print([chunk for chunk in client.chat.completions.create(model="Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "hello"}], max_tokens=5, stream=True)])'
INFO:httpx:HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 200 OK"
...
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)

after -

$ uv run --with https://pkg.stainless.com/s/llama-stack-client-python/e8ba87ec863e1b417327c14b112fbd8622505dbc/llama_stack_client-0.5.0a2-py3-none-any.whl python -c 'from llama_stack_client import LlamaStackClient; client = LlamaStackClient(base_url="http://localhost:8000"); print([chunk for chunk in client.chat.completions.create(model="Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "hello"}], max_tokens=5, stream=True)])'       
...
[ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), index=0, finish_reason=None, logprobs=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None, prompt_token_ids=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning='\n'), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning='Okay'), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning=','), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning=' the'), index=0, finish_reason='length', logprobs=None, stop_reason=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None)]

The config.yml is auto-generated from generate_config.py. Rather than
editing config.yml directly, add STREAMING to the generator and wire it
through StainlessConfig so the hook regenerates it correctly.

This also fixes the yaml formatting from the previous commit (yaml.safe_dump
uses single-quoted '[DONE]' and 2-space-indented list items).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SDK streaming fails on OpenAI-style data:[DONE] terminator for chat completions (TypeScript/Go/Python)

2 participants