Skip to content

[Build] Fix clang build issues for CPU and CUDA builds#27669

Open
tianleiwu wants to merge 2 commits intomainfrom
tlwu/fix_clang_build
Open

[Build] Fix clang build issues for CPU and CUDA builds#27669
tianleiwu wants to merge 2 commits intomainfrom
tlwu/fix_clang_build

Conversation

@tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Mar 15, 2026

Description

This PR fixes clang-specific build failures that show up in both the standalone clang build and the CUDA clang build. It keeps the build-system changes targeted, prefers source fixes where the warnings indicate real type or declaration issues, and avoids broader warning suppression than necessary for the CUDA provider target.

Summary of Changes

Build System

File Change
cmake/CMakeLists.txt Stop forwarding -Wshorten-64-to-32 through CUDA host compilation where the GNU host compiler does not recognize it.
cmake/onnxruntime_providers_cuda.cmake Add targeted clang -Wno-error handling for warning classes that are currently triggered by CUDA provider code and third-party CUDA headers under clang.

CPU / Common clang fixes

File Change
onnxruntime/core/common/cpuid_info.cc Replace the clang-incompatible __builtin_cpu_supports("waitpkg") path with the CPUID-bit check for TPAUSE detection.
onnxruntime/test/framework/allocation_planner_test.cc Refactor typeid assertions to avoid clang's potentially-evaluated-expression warning while keeping test coverage unchanged.

CUDA provider and contrib fixes

File Change
onnxruntime/contrib_ops/cuda/utils/dump_cuda_tensor.h Mark the IConsoleDumper overrides explicitly while leaving CUDA-only overloads unchanged.
onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc Use template on the dependent GetAttrOrDefault call so clang parses it correctly.
onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_api.cc Make narrowing conversions to flash-attention parameter fields explicit.
onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cc Make the nbits_ conversion explicit when calling the CUDA helper.
onnxruntime/contrib_ops/cuda/quantization/moe_quantization.cc Restrict the GCC-only warning pragma so clang does not treat it as an unknown warning option.
onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc Fix explicit state-field assignments to use the actual int field type.
onnxruntime/core/providers/cuda/cuda_mempool_arena.h Remove an unused private field that clang flagged in the CUDA provider build.

Testing

Tested CPU and CUDA 12.8 builds in Azure Linux with

  • clang 18.1.8
  • gcc 13.2
  • cmake 4.2.3

Example for CPU build:

export CC=clang
export CXX=clang++
bash build.sh --config RelWithDebInfo --parallel --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=ON

Motivation and Context

Clang is stricter than GCC/MSVC in a few areas that affect this tree: CUDA host flag forwarding, explicit narrowing, dependent template parsing, warnings emitted from third-party CUDA headers, and RTTI/typeid expressions in tests. The goal here is to keep the staged fix minimal and maintainable by correcting real source issues where practical and confining warning downgrades to the CUDA provider target where third-party header noise is currently unavoidable.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR appears focused on improving cross-compiler/CUDA build cleanliness by reducing/avoiding 64-to-32 narrowing warnings (especially under Clang/NVCC), tightening some C++ type usage to match downstream struct field types, and doing minor test refactoring.

Changes:

  • Refactors allocation planner multi-stream tests to use a shared helper for execution step type checks.
  • Adjusts several CUDA/contrib call sites to use explicit int casts/override to satisfy stricter compiler diagnostics.
  • Updates CUDA-related CMake warning handling (including Clang-specific -Wno-error=... additions and filtering for -Wshorten-64-to-32).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
onnxruntime/test/framework/allocation_planner_test.cc Adds a helper to reduce repetition when asserting execution-step types in multistream planner tests.
onnxruntime/core/providers/cuda/cuda_mempool_arena.h Removes an unused arena member field.
onnxruntime/core/common/cpuid_info.cc Simplifies Intel TPAUSE detection to use CPUID bit check consistently.
onnxruntime/contrib_ops/cuda/utils/dump_cuda_tensor.h Adds override for interface-matching typed Print methods; separates CUDA-only type overload declarations.
onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc Aligns BeamScorerState field assignments to int-typed struct members.
onnxruntime/contrib_ops/cuda/quantization/moe_quantization.cc Avoids applying GCC-specific diagnostic pragmas under Clang.
onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cc Casts nbits_ to int to match downstream API expectations and reduce narrowing warnings.
onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc Uses template keyword for dependent template member call.
onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_api.cc Adds explicit size_tint casts when populating FlashAttention params and window sizes.
cmake/onnxruntime_providers_cuda.cmake Filters out -Wshorten-64-to-32 for CUDA compilation and adds Clang-specific -Wno-error=... suppression for host/CUDA builds.
cmake/CMakeLists.txt Attempts to filter out -Wshorten-64-to-32 when forwarding warning flags through NVCC host-compiler options.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants