Eval bug: #17795 introduces subtle correctness errors

On HIP 2cd20b72ed3565ac6935911ca0d9b5d73ae70d0d has introduced a subtle correctness problem. For some workloads model (any model, tested with gpt-oss, devstral-2 small/large, GLM-4.5-air and Qwen3-30B-A3B) quality collapses, the most reliable way of reproducing this i have found is mistral vibe, sample output:

```
Please write a python script that calculates the elevation angle of the sun based on a location and time given on the command line

Here's a Python script that calculates the elevation angle of the sun based on location and time:
solar_elevation.py
import
I need to create a more detailed plan. The
The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The .
```

This problem dose not show in llamacpp's webui or cli for some reason, nor dose lama-perplexity show anything unusual. Other clients of the api are sometimes affected, but only mistral-vibe is consistently affected.



### Operating systems

Linux

### GGML backends

HIP

### Hardware

CDNA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: #17795 introduces subtle correctness errors #20433

Operating systems

GGML backends

Hardware

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: #17795 introduces subtle correctness errors #20433

Description

Operating systems

GGML backends

Hardware

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions