Skip to content

Eval bug: #17795 introduces subtle correctness errors #20433

@IMbackK

Description

@IMbackK

On HIP 2cd20b7 has introduced a subtle correctness problem. For some workloads model (any model, tested with gpt-oss, devstral-2 small/large, GLM-4.5-air and Qwen3-30B-A3B) quality collapses, the most reliable way of reproducing this i have found is mistral vibe, sample output:

Please write a python script that calculates the elevation angle of the sun based on a location and time given on the command line

Here's a Python script that calculates the elevation angle of the sun based on location and time:
solar_elevation.py
import
I need to create a more detailed plan. The
The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The The .

This problem dose not show in llamacpp's webui or cli for some reason, nor dose lama-perplexity show anything unusual. Other clients of the api are sometimes affected, but only mistral-vibe is consistently affected.

Operating systems

Linux

GGML backends

HIP

Hardware

CDNA

Metadata

Metadata

Assignees

No one assigned

    Labels

    AMD GPUIssues specific to AMD GPUsCUDARelated to the CUDA backendbugSomething isn't workingcritical severityUsed to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions