-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Open
Labels
Description
Name and Version
$ ./llama-server --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 122502 MiB):
Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, VRAM: 122502 MiB (7625 MiB free)
version: 8269 (0cd4f4720)
built with GNU 13.3.0 for Linux aarch64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
# Server
./llama-server --gpt-oss-120b-default --host 0.0.0.0
# Client
curl http://10.126.190.221:8013/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-api-key: 123456" \
-d '{
"model": "ggml-org/gpt-oss-120b-GGUF",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that generates JSON according to a given JSON Schema."
},
{
"role": "user",
"content": "Generate a JSON object that conforms to the following JSON Schema: {\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}, \"age\": {\"type\": \"integer\"}}, \"required\": [\"name\", \"age\"]}"
}
],
"response_format": {
"type": "json_object",
"json_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
}
}'Problem description & steps to reproduce
When trying to generate structured output by specifying a json schema, I get 500 server errors. This is happening on a DGX Spark running Ubuntu.
Compile the relevant version of llama.cpp and run the two commands above, substituting the IP address.
The output on the client side is:
{"error":{"code":500,"message":"Failed to parse input at pos 416: <|start|>assistant<|channel|>final<|message|>{\n \"name\": \"John Doe\",\n \"age\": 30\n}","type":"server_error"}}
First Bad Commit
Not clear
Relevant log output
Logs
srv operator(): got exception: {"error":{"code":500,"message":"Failed to parse input at pos 416: <|start|>assistant<|channel|>final<|message|>{\n \"name\": \"John Doe\",\n \"age\": 30\n}","type":"server_error"}}Reactions are currently unavailable