[Build] Fix clang build issues for CPU and CUDA builds by tianleiwu · Pull Request #27669 · microsoft/onnxruntime

tianleiwu · 2026-03-15T22:49:26Z

Description

This PR fixes clang-specific build failures that show up in both the standalone clang build and the CUDA clang build. It keeps the build-system changes targeted, prefers source fixes where the warnings indicate real type or declaration issues, and avoids broader warning suppression than necessary for the CUDA provider target.

Summary of Changes

Build System

File	Change
`cmake/CMakeLists.txt`	Stop forwarding `-Wshorten-64-to-32` through CUDA host compilation where the GNU host compiler does not recognize it.
`cmake/onnxruntime_providers_cuda.cmake`	Add targeted clang `-Wno-error` handling for warning classes that are currently triggered by CUDA provider code and third-party CUDA headers under clang.

CPU / Common clang fixes

File	Change
`onnxruntime/core/common/cpuid_info.cc`	Replace the clang-incompatible `__builtin_cpu_supports("waitpkg")` path with the CPUID-bit check for TPAUSE detection.
`onnxruntime/test/framework/allocation_planner_test.cc`	Refactor `typeid` assertions to avoid clang's potentially-evaluated-expression warning while keeping test coverage unchanged.

CUDA provider and contrib fixes

File	Change
`onnxruntime/contrib_ops/cuda/utils/dump_cuda_tensor.h`	Mark the `IConsoleDumper` overrides explicitly while leaving CUDA-only overloads unchanged.
`onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc`	Use `template` on the dependent `GetAttrOrDefault` call so clang parses it correctly.
`onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_api.cc`	Make narrowing conversions to flash-attention parameter fields explicit.
`onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cc`	Make the `nbits_` conversion explicit when calling the CUDA helper.
`onnxruntime/contrib_ops/cuda/quantization/moe_quantization.cc`	Restrict the GCC-only warning pragma so clang does not treat it as an unknown warning option.
`onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc`	Fix explicit state-field assignments to use the actual `int` field type.
`onnxruntime/core/providers/cuda/cuda_mempool_arena.h`	Remove an unused private field that clang flagged in the CUDA provider build.

Testing

Tested CPU and CUDA 12.8 builds in Azure Linux with

clang 18.1.8
gcc 13.2
cmake 4.2.3

Example for CPU build:

export CC=clang
export CXX=clang++
bash build.sh --config RelWithDebInfo --parallel --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=ON

Motivation and Context

Clang is stricter than GCC/MSVC in a few areas that affect this tree: CUDA host flag forwarding, explicit narrowing, dependent template parsing, warnings emitted from third-party CUDA headers, and RTTI/typeid expressions in tests. The goal here is to keep the staged fix minimal and maintainable by correcting real source issues where practical and confining warning downgrades to the CUDA provider target where third-party header noise is currently unavoidable.

Copilot

Pull request overview

This PR appears focused on improving cross-compiler/CUDA build cleanliness by reducing/avoiding 64-to-32 narrowing warnings (especially under Clang/NVCC), tightening some C++ type usage to match downstream struct field types, and doing minor test refactoring.

Changes:

Refactors allocation planner multi-stream tests to use a shared helper for execution step type checks.
Adjusts several CUDA/contrib call sites to use explicit int casts/override to satisfy stricter compiler diagnostics.
Updates CUDA-related CMake warning handling (including Clang-specific -Wno-error=... additions and filtering for -Wshorten-64-to-32).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
onnxruntime/test/framework/allocation_planner_test.cc	Adds a helper to reduce repetition when asserting execution-step types in multistream planner tests.
onnxruntime/core/providers/cuda/cuda_mempool_arena.h	Removes an unused arena member field.
onnxruntime/core/common/cpuid_info.cc	Simplifies Intel TPAUSE detection to use CPUID bit check consistently.
onnxruntime/contrib_ops/cuda/utils/dump_cuda_tensor.h	Adds `override` for interface-matching typed `Print` methods; separates CUDA-only type overload declarations.
onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc	Aligns BeamScorerState field assignments to `int`-typed struct members.
onnxruntime/contrib_ops/cuda/quantization/moe_quantization.cc	Avoids applying GCC-specific diagnostic pragmas under Clang.
onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cc	Casts `nbits_` to `int` to match downstream API expectations and reduce narrowing warnings.
onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc	Uses `template` keyword for dependent template member call.
onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_api.cc	Adds explicit `size_t`→`int` casts when populating FlashAttention params and window sizes.
cmake/onnxruntime_providers_cuda.cmake	Filters out `-Wshorten-64-to-32` for CUDA compilation and adds Clang-specific `-Wno-error=...` suppression for host/CUDA builds.
cmake/CMakeLists.txt	Attempts to filter out `-Wshorten-64-to-32` when forwarding warning flags through NVCC host-compiler options.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

onnxruntime/test/framework/allocation_planner_test.cc

cmake/CMakeLists.txt

Fix clang build issues for CPU and CUDA builds

48435ce

tianleiwu force-pushed the tlwu/fix_clang_build branch from 809a174 to 48435ce Compare March 16, 2026 05:20

tianleiwu requested a review from Copilot March 16, 2026 05:21

Copilot started reviewing on behalf of tianleiwu March 16, 2026 05:22 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

onnxruntime/test/framework/allocation_planner_test.cc Outdated Show resolved Hide resolved

cmake/CMakeLists.txt Outdated Show resolved Hide resolved

review feedback

5827b1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build] Fix clang build issues for CPU and CUDA builds#27669

[Build] Fix clang build issues for CPU and CUDA builds#27669
tianleiwu wants to merge 2 commits intomainfrom
tlwu/fix_clang_build

tianleiwu commented Mar 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tianleiwu commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary of Changes

Build System

CPU / Common clang fixes

CUDA provider and contrib fixes

Testing

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tianleiwu commented Mar 15, 2026 •

edited

Loading