vulkan: f16 mixed-precision state for GATED_DELTA_NET by ProgenyAlpha · Pull Request #20376 · ggml-org/llama.cpp

ProgenyAlpha · 2026-03-11T03:19:23Z

Follow-up to #20334. Splits out the f16 mixed-precision state optimization into its own PR per @0cc4m's feedback.

Stores the 128-element state array in float16_t, keeps all arithmetic in float32. No precision loss (13/13 backend-ops tests passing). Lower register pressure gives a measurable PP boost.

Depends on #20334

890M benchmarks (Qwen3-Coder-Next REAM Q4_K_M):

Metric	Without f16	With f16	Change
PP-512	165.31 t/s	174.54 t/s	+5.6%
TG-128	21.16 t/s	21.48 t/s	+1.5%

f16 pipeline auto-selects when the device supports shaderFloat16, falls back to f32 otherwise.

Implements the fused gated delta net recurrence as a Vulkan compute shader with full support for scalar gate, KDA vector gate, GQA broadcast, multi-token sequences, and permuted (non-contiguous) q/k inputs. Specialization constants select head size (32/64/128) and KDA mode at pipeline creation time. Passes all 13 test-backend-ops cases on AMD Radeon 890M (RADV GFX1150). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- vec4 dot products on all inner loops (dp4 hardware intrinsic) - Cache exp(g) in shared memory for KDA path, eliminating ~32K redundant global reads and ~16K redundant exp() calls per token - vec4 fused decay + rank-1 update (3 vec4 ops vs 12 scalar ops) - Add perf benchmark cases for GATED_DELTA_NET to test-backend-ops KDA TG: +5.4% throughput. Non-KDA: no regressions. 13/13 test-backend-ops passing on AMD Radeon 890M (RADV GFX1150). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pipeline array refactor [3][2], A_TYPE/D_TYPE/FLOAT_TYPE shader macros, scale in push constants, supports_op fix, dispatch restructuring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Store state in float16_t registers to halve register pressure and bandwidth. Accumulation stays in float32 for accuracy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ProgenyAlpha and others added 4 commits March 10, 2026 04:00

vulkan: address review feedback for GATED_DELTA_NET

2007841

Pipeline array refactor [3][2], A_TYPE/D_TYPE/FLOAT_TYPE shader macros, scale in push constants, supports_op fix, dispatch restructuring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vulkan: add f16 mixed-precision state for GATED_DELTA_NET

be06ee2

Store state in float16_t registers to halve register pressure and bandwidth. Accumulation stays in float32 for accuracy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ProgenyAlpha requested review from 0cc4m and ggerganov as code owners March 11, 2026 03:19

ProgenyAlpha mentioned this pull request Mar 11, 2026

vulkan: add GATED_DELTA_NET op support #20334

Open

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: f16 mixed-precision state for GATED_DELTA_NET#20376

vulkan: f16 mixed-precision state for GATED_DELTA_NET#20376
ProgenyAlpha wants to merge 4 commits intoggml-org:masterfrom
ProgenyAlpha:vulkan-gdn-f16

ProgenyAlpha commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ProgenyAlpha commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant