Skip to content

CUDA: GDN hide memory latency#20537

Merged
am17an merged 1 commit intoggml-org:masterfrom
am17an:cuda_gdn_load2
Mar 16, 2026
Merged

CUDA: GDN hide memory latency#20537
am17an merged 1 commit intoggml-org:masterfrom
am17an:cuda_gdn_load2

Conversation

@am17an
Copy link
Contributor

@am17an am17an commented Mar 14, 2026

#20448 got closed because #20443 got merged. @IMbackK could you please check if this is not causing regressions on HIP?

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 14, 2026
@IMbackK
Copy link
Collaborator

IMbackK commented Mar 15, 2026

This pr makes no measurable difference in performance on CDNA

master

model size params backend ngl n_ubatch fa test t/s
qwen35moe 35B.A3B Q8_0 28.21 GiB 34.66 B ROCm 99 1 1 pp2048 76.96 ± 0.22
qwen35moe 35B.A3B Q8_0 28.21 GiB 34.66 B ROCm 99 64 1 pp2048 347.21 ± 2.68
qwen35moe 35B.A3B Q8_0 28.21 GiB 34.66 B ROCm 99 512 1 pp2048 953.69 ± 2.77
qwen35moe 35B.A3B Q8_0 28.21 GiB 34.66 B ROCm 99 2048 1 pp2048 1528.38 ± 4.24

pr

model size params backend ngl n_ubatch fa test t/s
qwen35moe 35B.A3B Q8_0 28.21 GiB 34.66 B ROCm 99 1 1 pp2048 77.38 ± 0.50
qwen35moe 35B.A3B Q8_0 28.21 GiB 34.66 B ROCm 99 64 1 pp2048 348.25 ± 2.66
qwen35moe 35B.A3B Q8_0 28.21 GiB 34.66 B ROCm 99 512 1 pp2048 954.84 ± 2.60
qwen35moe 35B.A3B Q8_0 28.21 GiB 34.66 B ROCm 99 2048 1 pp2048 1528.30 ± 4.71

passes op test too.

@am17an am17an merged commit 34818ea into ggml-org:master Mar 16, 2026
81 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants