Skip to content

CI: add hip quality check#20430

Draft
IMbackK wants to merge 10 commits intoggml-org:masterfrom
IMbackK:hipcheck
Draft

CI: add hip quality check#20430
IMbackK wants to merge 10 commits intoggml-org:masterfrom
IMbackK:hipcheck

Conversation

@IMbackK
Copy link
Collaborator

@IMbackK IMbackK commented Mar 11, 2026

Add "hip-check" ci workflow, this workflow is intended to solve the following problems for us:

  1. The hip backend has had repeated problems with people adding #pragma unroll loops that cant be unrolled see ssm-conv, softmax and CUDA: use shared mem for ssm_conv #20128 (comment)
    • this workflow builds the hip backend with Werror to make it obvious when these occure
    • i did not make the hip builds Werror generally as we can not guarentee that there will be no warnings when built against every rocm version we support
  2. Currently we want to support rocm >=6.1 so build for 6.1 in the build workflow to check that no functions from newer rocm sneak in, however we build the release against the latest version of rocm, this causes the issue that we dont notice if a pr dosent build against the latest version until the release fails
    • This workflow builds against the same version of rocm as the release to check for this
  3. GCN/CDNA have a rather small 64 KiB vector register file and are wave 64, which means that only 256 registers are available. Quite often kernels get added that include significant register spill.
    • This pr adds a small script that is used in the workflow to check for kernels that spill significant registers
    • The script ignores kernels with significant spills currently in the code base (altho several of these should be investigated at some point)

The idea of this workflow that its failure NOT necessarily block a pr, but that its should prompt investigation / possible adding to the workflows witelist.

@github-actions github-actions bot added script Script related python python script changes devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Mar 11, 2026
@IMbackK
Copy link
Collaborator Author

IMbackK commented Mar 11, 2026

theres a run here https://github.com/IMbackK/llama.cpp/actions/runs/22975551980/job/66703103556 to show how this works (the failure in the VGPR check is on purpose for illustration)

IMbackK and others added 8 commits March 11, 2026 23:56
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@IMbackK
Copy link
Collaborator Author

IMbackK commented Mar 11, 2026

@CISC the hip compiler will ignore #pragma clang diagnostic push when --save-temps is active. For this reason a build with -DGGML_HIP_EXPORT_METRICS=On and -Werror can not succeed as warning will be emmited here:

#pragma clang diagnostic push
among other places.

@CISC
Copy link
Collaborator

CISC commented Mar 11, 2026

Better split the jobs again then.

@IMbackK
Copy link
Collaborator Author

IMbackK commented Mar 11, 2026

Right

@CISC
Copy link
Collaborator

CISC commented Mar 12, 2026

Check CI failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning python python script changes script Script related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants