-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Description
Name and Version
./build_vulkan/bin/llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV STRIX_HALO) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
version: 8183 (66d65ec) (ignore that, it was the last good commit as I bisected, bisected from HEAD which was d63aa39 at the time)
built with GNU 15.2.1 for Linux x86_64
Operating systems
Linux
GGML backends
Vulkan
Hardware
Ryzen AI Max 395+
Models
Issue encountered with big models(specifically in my case Qwen3.5-122B-A10B-UD-Q6_K_XL and NVIDIA-Nemotron-3-Super-120B-A12B-UD-Q6_K)
Problem description & steps to reproduce
command line used:
./build_vulkan/bin/llama-server --temp 0.6 --top-p 0.95 --min-p 0.0 --top-k 20 -fitt 5120 -ngl 999 --no-mmap --host 192.168.1.105 --port 3334 -m ./models/Qwen3.5-122B-A10B-UD-Q6_K_XL-00001-of-00004.gguf
After the commit specified below, models fail to load, the whole system nearly hangs, I have to killall on console terminal which takes a while to trigger. I've experienced this in the past when memory usage was too high(as ram and vram are shared on this mini PC).
First Bad Commit
Bisected to precisely:
# first bad commit: [3191462] vulkan: improve partial offloading performance on AMD (#19976)
Relevant log output
Nothing relevant in dmesg beyond journalctl complaining about memory pressure. Nothing in journalctl and llama-server log just shows it around midway loading the model with dots.