-
Notifications
You must be signed in to change notification settings - Fork 18
Build and publish cuda enabled docker image #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…parated compute cap (e.g. `CUDA_ARCHS=89,90,120`)
…and allow multi compute caps
27715ca to
c5cb7ee
Compare
c934a73 to
c4fcd31
Compare
c4fcd31 to
39cd4e2
Compare
jsign
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT, left some comments for your consideration
|
|
||
| pub use error::Error; | ||
|
|
||
| /// Applies per-zkVM CUDA architecture build args to a Docker build command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe obvious question to understand the intention of the PR: if a machine signals support for 89 and 120, why would they care about 89 and not always select the highest? (as Zisk does apparently)
I was thinking of multi-GPU machines with different kinds of GPUs, but that would mean Zisk would work there.
| .map(|arch| format!("--generate-code arch=compute_{arch},code=sm_{arch} ")) | ||
| .collect::<String>(); | ||
| cmd.build_arg("NVCC_APPEND_FLAGS", flags.trim_end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could avoid the trailing space with trim_end(), by collecting to String and then join(" "). But NBD.
| .split(',') | ||
| .filter_map(|s| s.parse::<u32>().ok()) | ||
| .max() | ||
| .unwrap_or(120); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have this default in some constant and applied to the other zkVMs too? Or mabye there is a reason for this exception in Zisk? (or maybe error, since we might have non-empty expectation from the other method that parses ENV or detect GPU capabilities?)
|
|
||
| # Per-zkVM CUDA architecture translation | ||
| if [ "$CUDA" = true ] && [ -n "$CUDA_ARCHS" ]; then | ||
| case "$ZKVM" in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should SP1 be here?
| --registry ${{ needs.image_meta.outputs.registry }} \ | ||
| --tag ${{ needs.image_meta.outputs.sha_tag }}-cuda \ | ||
| --base \ | ||
| --cuda-archs '89,120' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have these 89,120 in some places in this files, worth defining a const/env?
CUDA_ARCHtoCUDA_ARCHSto accept comma-separated numeric part of CUDA compute capabilites e.g.CUDA_ARCHS=89,120, if the zkVM supports multiple CUDA compute capabilites it'll be forwarded (Airbender, OpenVM, Risc0), if not the largest will be used (ZisK)CUDA_ARCHSto that when building imageCUDA_ARCHS=89,120(L40S, RTX 40 series, RTX 50 series), this works fine for most zkVMs but takes a bit long for Risc0 (~1hr), but it'd be cached in future PR if the dockerfile is untouched.