Build and publish cuda enabled docker image #291

han0110 · 2026-02-10T12:53:03Z

Rename env CUDA_ARCH to CUDA_ARCHS to accept comma-separated numeric part of CUDA compute capabilites e.g. CUDA_ARCHS=89,120, if the zkVM supports multiple CUDA compute capabilites it'll be forwarded (Airbender, OpenVM, Risc0), if not the largest will be used (ZisK)
Update dockerfiles to use the env var that the zkVM recognizes, and translate CUDA_ARCHS to that when building image
Update CI
- For each PR, it additionally tests building CUDA enabled images using CUDA_ARCHS=89,120 (L40S, RTX 40 series, RTX 50 series), this works fine for most zkVMs but takes a bit long for Risc0 (~1hr), but it'd be cached in future PR if the dockerfile is untouched.
- For push to master it builds and pushs images with git sha tag (without and with CUDA enabled if supported), also builds and pushs cluster image if supported. After build and push succeeds, the test zkVM workflow will be triggered to run same test as PR but using the published images.
- For push to tag, it add the semver tag on the git sha tag for all images pushed.

…parated compute cap (e.g. `CUDA_ARCHS=89,90,120`)

…and allow multi compute caps

…,120

jsign

LGMT, left some comments for your consideration

jsign · 2026-02-11T12:52:34Z

crates/dockerized/src/zkvm.rs


 pub use error::Error;

+/// Applies per-zkVM CUDA architecture build args to a Docker build command.


Maybe obvious question to understand the intention of the PR: if a machine signals support for 89 and 120, why would they care about 89 and not always select the highest? (as Zisk does apparently)
I was thinking of multi-GPU machines with different kinds of GPUs, but that would mean Zisk would work there.

jsign · 2026-02-11T12:55:22Z

crates/dockerized/src/zkvm.rs

+                .map(|arch| format!("--generate-code arch=compute_{arch},code=sm_{arch} "))
+                .collect::<String>();
+            cmd.build_arg("NVCC_APPEND_FLAGS", flags.trim_end())


I think we could avoid the trailing space with trim_end(), by collecting to String and then join(" "). But NBD.

jsign · 2026-02-11T12:56:09Z

crates/dockerized/src/zkvm.rs

+                .split(',')
+                .filter_map(|s| s.parse::<u32>().ok())
+                .max()
+                .unwrap_or(120);


Should we have this default in some constant and applied to the other zkVMs too? Or mabye there is a reason for this exception in Zisk? (or maybe error, since we might have non-empty expectation from the other method that parses ENV or detect GPU capabilities?)

jsign · 2026-02-11T14:14:39Z

.github/scripts/build-image.sh

+
+# Per-zkVM CUDA architecture translation
+if [ "$CUDA" = true ] && [ -n "$CUDA_ARCHS" ]; then
+    case "$ZKVM" in


Should SP1 be here?

jsign · 2026-02-11T14:16:02Z

.github/workflows/build-and-push-images.yml

+            --registry ${{ needs.image_meta.outputs.registry }} \
+            --tag ${{ needs.image_meta.outputs.sha_tag }}-cuda \
+            --base \
+            --cuda-archs '89,120'


We have these 89,120 in some places in this files, worth defining a const/env?

han0110 added 3 commits February 10, 2026 13:00

feat: read env CUDA_ARCHS instead of CUDA_ARCH to accept comma-se…

8357931

…parated compute cap (e.g. `CUDA_ARCHS=89,90,120`)

feat: update dockerfiles to use the env var that the zkVM recognize, …

f28fa26

…and allow multi compute caps

ci: update CI to build cuda enabled images for cuda compute cap 89,90…

c5cb7ee

…,120

han0110 force-pushed the han/feature/cuda-enabled-docker-image branch from 27715ca to c5cb7ee Compare February 10, 2026 13:01

ci: split test and image publishing

e2ea550

han0110 force-pushed the han/feature/cuda-enabled-docker-image branch from c934a73 to c4fcd31 Compare February 11, 2026 01:50

ci: skip pushing ere-base; build and push ere-cluster

39cd4e2

han0110 force-pushed the han/feature/cuda-enabled-docker-image branch from c4fcd31 to 39cd4e2 Compare February 11, 2026 05:28

han0110 added 3 commits February 11, 2026 08:10

ci: triggered by workflow_run to reuse the published images

2f520cc

ci: separate semver tagging into independent workflow

8fb091c

ci: cancel in progress only for pr

c8be867

han0110 marked this pull request as ready for review February 11, 2026 12:00

han0110 requested a review from jsign February 11, 2026 14:06

jsign approved these changes Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build and publish cuda enabled docker image #291

Build and publish cuda enabled docker image #291

han0110 commented Feb 10, 2026 •

edited

Loading

Uh oh!

jsign left a comment

Uh oh!

jsign Feb 11, 2026

Uh oh!

jsign Feb 11, 2026

Uh oh!

jsign Feb 11, 2026

Uh oh!

jsign Feb 11, 2026

Uh oh!

jsign Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		pub use error::Error;

		/// Applies per-zkVM CUDA architecture build args to a Docker build command.

Build and publish cuda enabled docker image #291

Are you sure you want to change the base?

Build and publish cuda enabled docker image #291

Conversation

han0110 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsign left a comment

Choose a reason for hiding this comment

Uh oh!

jsign Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

jsign Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

jsign Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

jsign Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

jsign Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

han0110 commented Feb 10, 2026 •

edited

Loading