Misc. bug: Generating structured output according to a given JSON schema fails with a 500 server error using gpt-oss-120b

### Name and Version

```
$ ./llama-server --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 122502 MiB):
  Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes, VRAM: 122502 MiB (7625 MiB free)
version: 8269 (0cd4f4720)
built with GNU 13.3.0 for Linux aarch64
```

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
# Server
./llama-server --gpt-oss-120b-default --host 0.0.0.0


# Client
curl http://10.126.190.221:8013/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "x-api-key: 123456" \
    -d '{
        "model": "ggml-org/gpt-oss-120b-GGUF",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant that generates JSON according to a given JSON Schema."
            },
            {
                "role": "user",
                "content": "Generate a JSON object that conforms to the following JSON Schema: {\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}, \"age\": {\"type\": \"integer\"}}, \"required\": [\"name\", \"age\"]}"
            }
        ],
        "response_format": {
            "type": "json_object",
            "json_schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"}
                },
                "required": ["name", "age"]
            }
        }
    }'
```

### Problem description & steps to reproduce

When trying to generate structured output by specifying a json schema, I get 500 server errors. This is happening on a DGX Spark running Ubuntu.

Compile the relevant version of llama.cpp and run the two commands above, substituting the IP address.

The output on the client side is:

```
{"error":{"code":500,"message":"Failed to parse input at pos 416: <|start|>assistant<|channel|>final<|message|>{\n  \"name\": \"John Doe\",\n  \"age\": 30\n}","type":"server_error"}}
```

### First Bad Commit

Not clear

### Relevant log output

<details>
<summary>Logs</summary>


```console
srv    operator(): got exception: {"error":{"code":500,"message":"Failed to parse input at pos 416: <|start|>assistant<|channel|>final<|message|>{\n  \"name\": \"John Doe\",\n  \"age\": 30\n}","type":"server_error"}}
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Generating structured output according to a given JSON schema fails with a 500 server error using gpt-oss-120b #20344

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Generating structured output according to a given JSON schema fails with a 500 server error using gpt-oss-120b #20344

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions