Skip to content

Eval bug: Regression when trying to load a big model #20439

@RipleyTom

Description

@RipleyTom

Name and Version

./build_vulkan/bin/llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV STRIX_HALO) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
version: 8183 (66d65ec) (ignore that, it was the last good commit as I bisected, bisected from HEAD which was d63aa39 at the time)
built with GNU 15.2.1 for Linux x86_64

Operating systems

Linux

GGML backends

Vulkan

Hardware

Ryzen AI Max 395+

Models

Issue encountered with big models(specifically in my case Qwen3.5-122B-A10B-UD-Q6_K_XL and NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q6_K)

Problem description & steps to reproduce

command line used:

./build_vulkan/bin/llama-server --temp 0.6 --top-p 0.95 --min-p 0.0 --top-k 20 -fitt 5120 -ngl 999 --no-mmap --host 192.168.1.105 --port 3334 -m ./models/Qwen3.5-122B-A10B-UD-Q6_K_XL-00001-of-00004.gguf

After the commit specified below, models fail to load, the whole system nearly hangs, I have to killall on console terminal which takes a while to trigger. I've experienced this in the past when memory usage was too high(as ram and vram are shared on this mini PC).

First Bad Commit

Bisected to precisely:
# first bad commit: [3191462] vulkan: improve partial offloading performance on AMD (#19976)

Relevant log output

Nothing relevant in dmesg beyond journalctl complaining about memory pressure. Nothing in journalctl and llama-server log just shows it around midway loading the model with dots.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions