[Build] Fix clang build issues for CPU and CUDA builds#27669
Open
[Build] Fix clang build issues for CPU and CUDA builds#27669
Conversation
809a174 to
48435ce
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR appears focused on improving cross-compiler/CUDA build cleanliness by reducing/avoiding 64-to-32 narrowing warnings (especially under Clang/NVCC), tightening some C++ type usage to match downstream struct field types, and doing minor test refactoring.
Changes:
- Refactors allocation planner multi-stream tests to use a shared helper for execution step type checks.
- Adjusts several CUDA/contrib call sites to use explicit
intcasts/overrideto satisfy stricter compiler diagnostics. - Updates CUDA-related CMake warning handling (including Clang-specific
-Wno-error=...additions and filtering for-Wshorten-64-to-32).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/framework/allocation_planner_test.cc | Adds a helper to reduce repetition when asserting execution-step types in multistream planner tests. |
| onnxruntime/core/providers/cuda/cuda_mempool_arena.h | Removes an unused arena member field. |
| onnxruntime/core/common/cpuid_info.cc | Simplifies Intel TPAUSE detection to use CPUID bit check consistently. |
| onnxruntime/contrib_ops/cuda/utils/dump_cuda_tensor.h | Adds override for interface-matching typed Print methods; separates CUDA-only type overload declarations. |
| onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc | Aligns BeamScorerState field assignments to int-typed struct members. |
| onnxruntime/contrib_ops/cuda/quantization/moe_quantization.cc | Avoids applying GCC-specific diagnostic pragmas under Clang. |
| onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cc | Casts nbits_ to int to match downstream API expectations and reduce narrowing warnings. |
| onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc | Uses template keyword for dependent template member call. |
| onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_api.cc | Adds explicit size_t→int casts when populating FlashAttention params and window sizes. |
| cmake/onnxruntime_providers_cuda.cmake | Filters out -Wshorten-64-to-32 for CUDA compilation and adds Clang-specific -Wno-error=... suppression for host/CUDA builds. |
| cmake/CMakeLists.txt | Attempts to filter out -Wshorten-64-to-32 when forwarding warning flags through NVCC host-compiler options. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes clang-specific build failures that show up in both the standalone clang build and the CUDA clang build. It keeps the build-system changes targeted, prefers source fixes where the warnings indicate real type or declaration issues, and avoids broader warning suppression than necessary for the CUDA provider target.
Summary of Changes
Build System
cmake/CMakeLists.txt-Wshorten-64-to-32through CUDA host compilation where the GNU host compiler does not recognize it.cmake/onnxruntime_providers_cuda.cmake-Wno-errorhandling for warning classes that are currently triggered by CUDA provider code and third-party CUDA headers under clang.CPU / Common clang fixes
onnxruntime/core/common/cpuid_info.cc__builtin_cpu_supports("waitpkg")path with the CPUID-bit check for TPAUSE detection.onnxruntime/test/framework/allocation_planner_test.cctypeidassertions to avoid clang's potentially-evaluated-expression warning while keeping test coverage unchanged.CUDA provider and contrib fixes
onnxruntime/contrib_ops/cuda/utils/dump_cuda_tensor.hIConsoleDumperoverrides explicitly while leaving CUDA-only overloads unchanged.onnxruntime/contrib_ops/cuda/bert/group_query_attention.cctemplateon the dependentGetAttrOrDefaultcall so clang parses it correctly.onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_api.cconnxruntime/contrib_ops/cuda/quantization/matmul_nbits.ccnbits_conversion explicit when calling the CUDA helper.onnxruntime/contrib_ops/cuda/quantization/moe_quantization.cconnxruntime/contrib_ops/cuda/transformers/generation_device_helper.ccintfield type.onnxruntime/core/providers/cuda/cuda_mempool_arena.hTesting
Tested CPU and CUDA 12.8 builds in Azure Linux with
Example for CPU build:
Motivation and Context
Clang is stricter than GCC/MSVC in a few areas that affect this tree: CUDA host flag forwarding, explicit narrowing, dependent template parsing, warnings emitted from third-party CUDA headers, and RTTI/typeid expressions in tests. The goal here is to keep the staged fix minimal and maintainable by correcting real source issues where practical and confining warning downgrades to the CUDA provider target where third-party header noise is currently unavoidable.