fix(stainless): handle [DONE] SSE terminator in streaming responses by dtmeadows · Pull Request #5012 · llamastack/llama-stack

dtmeadows · 2026-02-27T14:57:39Z

Changes

Adds a top-level streaming.on_event config to client-sdks/stainless/config.yml that gracefully handles the OpenAI-standard data: [DONE] SSE stream terminator. Without this, the generated SDKs crash when consuming vLLM or any other OpenAI-compatible backend, because they attempt to JSON-parse the [DONE] sentinel as if it were a regular event chunk — producing errors like Could not parse message into JSON: [DONE].

The fix adds a priority-ordered event handler table. The first rule matches data: [DONE] and signals a clean stream end (handle: done); the fallthrough rule yields all other events normally.

streaming:
  on_event:
    - data_starts_with: "[DONE]"
      handle: done
    - kind: fallthrough
      handle: yield
      error_property: error

Notes

Fixes #4744.

Adds a top-level streaming.on_event config that gracefully handles the OpenAI-standard data: [DONE] stream terminator. Without this, SDKs crash when consuming vLLM or other OpenAI-compatible backends that emit [DONE] at the end of SSE streams, since clients try to JSON-parse the sentinel as a regular event chunk. Fixes: llamastack#4744

github-actions · 2026-02-27T14:58:35Z

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

fix(stainless): handle [DONE] SSE terminator in streaming responses

Edit this comment to update it. It will appear in the SDK's changelogs.

✅ llama-stack-client-node studio · code · diff

Your SDK built successfully.
generate ⚠️ → build ✅ → lint ✅ → test ❗
npm install https://pkg.stainless.com/s/llama-stack-client-node/3eca3a1cc6115e23766991ad8775670a0386e942/dist.tar.gz

⏳ llama-stack-client-go studio · code · diff

generate ❗ → build ⏳ → lint ❗ → test ❗
go get github.com/stainless-sdks/llama-stack-client-go@67a4b2d546608ce3408df614e6113cdb131a448b

✅ llama-stack-client-python studio · code · diff

Your SDK built successfully.
generate ⚠️ → build ✅ → lint ✅ → test ✅
pip install https://pkg.stainless.com/s/llama-stack-client-python/5ec4518716be86ac2444d480665068fb3b4b1118/llama_stack_client-0.5.0a2-py3-none-any.whl
New diagnostics (2 note)

💡 ReadmeExample/MissingParam: Example is missing required parameter `messages`.

💡 ReadmeExample/MissingParam: Example is missing required parameter `model`.

✅ llama-stack-client-openapi studio · code · diff

Your SDK built successfully.
generate ⚠️

⏳ These are partial results; builds are still running.

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-02-27 19:25:11 UTC

mattf

before -

$ uv run --with llama-stack-client python -c 'from llama_stack_client import LlamaStackClient; client = LlamaStackClient(base_url="http://localhost:8000"); print([chunk for chunk in client.chat.completions.create(model="Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "hello"}], max_tokens=5, stream=True)])'
INFO:httpx:HTTP Request: POST http://localhost:8000/v1/chat/completions "HTTP/1.1 200 OK"
...
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)

after -

$ uv run --with https://pkg.stainless.com/s/llama-stack-client-python/e8ba87ec863e1b417327c14b112fbd8622505dbc/llama_stack_client-0.5.0a2-py3-none-any.whl python -c 'from llama_stack_client import LlamaStackClient; client = LlamaStackClient(base_url="http://localhost:8000"); print([chunk for chunk in client.chat.completions.create(model="Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "hello"}], max_tokens=5, stream=True)])'       
...
[ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), index=0, finish_reason=None, logprobs=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None, prompt_token_ids=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning='\n'), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning='Okay'), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning=','), index=0, finish_reason=None, logprobs=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None), ChatCompletionChunk(id='chatcmpl-a9c1fa7389f8f8f9', choices=[Choice(delta=ChoiceDelta(content=None, reasoning_content=None, refusal=None, role=None, tool_calls=None, reasoning=' the'), index=0, finish_reason='length', logprobs=None, stop_reason=None, token_ids=None)], created=1772205573, model='Qwen/Qwen3-0.6B', object='chat.completion.chunk', service_tier=None, usage=None)]

The config.yml is auto-generated from generate_config.py. Rather than editing config.yml directly, add STREAMING to the generator and wire it through StainlessConfig so the hook regenerates it correctly. This also fixes the yaml formatting from the previous commit (yaml.safe_dump uses single-quoted '[DONE]' and 2-space-indented list items).

dtmeadows requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners February 27, 2026 14:57

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 27, 2026

mattf approved these changes Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(stainless): handle [DONE] SSE terminator in streaming responses#5012

fix(stainless): handle [DONE] SSE terminator in streaming responses#5012
dtmeadows wants to merge 2 commits intollamastack:mainfrom
stainless-sdks:fix/streaming-done-terminator

dtmeadows commented Feb 27, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

mattf left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dtmeadows commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Notes

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dtmeadows commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading