Skip to content

Fix NvTensorRTRTXExecutionProvider capability claims#27631

Merged
tianleiwu merged 6 commits intomainfrom
copilot/fix-nvtensor-rtrtx-claim-issue
Mar 16, 2026
Merged

Fix NvTensorRTRTXExecutionProvider capability claims#27631
tianleiwu merged 6 commits intomainfrom
copilot/fix-nvtensor-rtrtx-claim-issue

Conversation

Copy link
Contributor

Copilot AI commented Mar 12, 2026

  • Analyze CI failure: EPContextNode_ForeignSourceSkipped assertion expects "OpenVINOExecutionProvider" in error but actual error from TransformerMemcpyImpl doesn't include it
  • Fix tensorrt_basic_test.cc: Remove overly-specific "OpenVINOExecutionProvider" assertion from EPContextNode_ForeignSourceSkipped
  • Fix nv_ep_context_test.cc: Remove same overly-specific assertion from EPContextNode_ForeignSourceSkipped (proactive)
  • Run code review (no actionable findings)
  • Run CodeQL security check (no findings)
Original prompt

This section details on the original issue you should resolve

<issue_title>NvTensorRTRTXExecutionProvider::GetCapability claims EPContext nodes belonging to other EPs, causing crash on multi-GPU systems</issue_title>
<issue_description>### Describe the issue

On multi-GPU systems where both OpenVINOExecutionProvider and NvTensorRTRTXExecutionProvider are registered,
loading an EPContext model produced by OpenVINO causes an access violation (0xC0000005) or
"Could not find an implementation for EPContext(1)" error.

The root cause is that NvExecutionProvider::GetCapability() in nv_execution_provider.cc claims all
EPContext nodes without checking the source attribute:

// nv_execution_provider.cc ~line 2019
const bool is_context_node = node && !node->OpType().empty() && node->OpType() == EPCONTEXT_OP;
if (is_context_node) {
    // Claims any EPContext node — even those produced by OpenVINO, QNN, etc.
    result.push_back(ComputeCapability::Create(std::move(sub_graph)));
}

The EPContext contrib op schema defines an optional source attribute specifically for EP identification
(contrib_defs.cc). Other EPs already check this attribute:

  • OpenVINO EP checks source == kOpenVINOExecutionProvider in EPCtxHandler::CheckForOVEPCtxNode()
  • QNN EP checks cache_source == "qnnexecutionprovider" || cache_source == "qnn" in PartitionCtxModel()

The NvTensorRTRTX EP neither checks source when claiming EPContext nodes in GetCapability(),
nor writes source when creating EPContext nodes in CreateCtxNode().

Proposed fix

Add a source attribute check to NvExecutionProvider::GetCapability() before claiming EPContext nodes:

const bool is_context_node = node && !node->OpType().empty() && node->OpType() == EPCONTEXT_OP;
if (is_context_node) {
    // Only claim EPContext nodes that belong to this EP.
    // If the SOURCE attribute is present and doesn't match, skip the node.
    const auto& attrs = node->GetAttributes();
    if (attrs.count(SOURCE) > 0 &&
        attrs.at(SOURCE).s() != kNvTensorRTRTXExecutionProvider) {
        continue;
    }
    // ... claim the node
}

This requires adding static const std::string SOURCE = "source"; to onnx_ctx_model_helper.h
(matching the existing constant in QNN EP's builder/onnx_ctx_model_helper.h and OpenVINO EP's
onnx_ctx_model_helper.h).

Additionally, CreateCtxNode() in onnx_ctx_model_helper.cc should be updated to write the
source attribute (set to kNvTensorRTRTXExecutionProvider) when producing EPContext models,
following the same pattern as OpenVINO EP's AddOVEPCtxNodeToGraph(). This ensures NvTensorRTRTX
EPContext models are properly tagged for the future.

Urgency

This is a P1 blocker for MLCommons MLPerf Client v1.6 testing on multi-GPU laptop systems
(Intel iGPU + NVIDIA dGPU). See: https://github.com/mlcommons/mlperf_client_dev/issues/976

To reproduce

System: Any system with both an Intel GPU (OpenVINO EP) and NVIDIA GPU (NvTensorRTRTX EP)

  1. Register both OpenVINO EP and NvTensorRTRTX EP with ORT
  2. Load an EPContext model with source=OpenVINOExecutionProvider (e.g., Phi-3.5 compiled by OpenVINO)
  3. Create a session with auto EP selection (PREFER_GPU) or manual multi-EP ordering

Expected: OpenVINO EP claims its own EPContext node; NvTensorRTRTX EP skips it
Actual: NvTensorRTRTX EP claims the OpenVINO EPContext node → access violation crash (0xC0000005)

A self-contained min-repro (C++ source + Python model generator) is available. The repro uses
RegisterExecutionProviderLibrary to register both EP DLLs, creates a synthetic EPContext model
with source=OpenVINOExecutionProvider, and demonstrates the crash.

Tested fix

We have built and tested the fix described above against:

  • Synthetic EPContext model with source=OpenVINOExecutionProvider
  • Real Phi-3.5 EPContext model from MLPerf (openvino_model_dy.onnx)

Both pass with the fix applied: NvTensorRTRTX EP correctly skips the OpenVINO EPContext node
("[NvTensorRTRTX EP] No graph will run on Nv execution provider"), and OpenVINO EP successfully
creates the session.

The classic TensorRTExecutionProvider has the same issue (tensorrt_execution_provider.cc line 2593)
but is less commonly deployed alongside other EPs.

Environment

  • ORT version: 1.24.3 (rel-1.24.3 tag)
  • OS: Windows 11
  • Hardware: Intel Core i9-11950H (iGPU) + NVIDIA GeForce RTX 3080 Laptop GPU (dGPU)
  • EP packages: OpenVINO EP (MicrosoftCorporationII.WinML.Intel.OpenVINO.EP.1.8 1.8.63.0), NvTensorRTRTX EP (MicrosoftCorporationII.WinML.NVIDIA.TRT-RTX.EP.2 0.0.26.0)
    </issue_description>

<agent_instructions>Please provide a fix and add unit test.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits March 12, 2026 05:21
… EPs

- Add SOURCE constant to both EPs' onnx_ctx_model_helper.h
- Add source attribute check in NvTensorRTRTX GetCapability() and GraphHasCtxNode()
- Add source attribute check in classic TensorRT GraphHasCtxNode()
- Write source attribute when creating EPContext nodes in both EPs

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
- NvTensorRTRTX EP: Test that foreign EPContext nodes (OpenVINO, classic TRT source) are skipped
- Classic TensorRT EP: Test that foreign EPContext nodes (OpenVINO, NvTensorRTRTX source) are skipped

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
…ut source attribute

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copy link
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor suggestions:

  1. Consider deduplicating CreateSyntheticEPContextModel.
    The same helper was added twice with minor path-type differences, once in onnxruntime/test/providers/nv_tensorrt_rtx/nv_ep_context_test.cc:207 and once in onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc:1415. This is test-only duplication, so low priority.

  2. Keep the classic TensorRT scope note in mind.
    The source filter there is wired through GraphHasCtxNode(), and the main GetCapability() entry point only uses that helper for the single-node EPContext fast path at onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:2593. That matches current EPContext usage, so I do not view it as a blocker for this PR, but if multi-node or mixed-form context graphs become a supported scenario later, the filter will need to move deeper into the general claim path as well.

@tianleiwu tianleiwu requested a review from chilo-ms March 12, 2026 06:21
@tianleiwu tianleiwu marked this pull request as ready for review March 12, 2026 07:00
…essage

The error when a foreign-source EPContext node is not claimed comes from
TransformerMemcpyImpl, which mentions the EPContext node name but not the
foreign source EP name. Relax the assertions to only check for 'EPContext'
in the error message, which is sufficient to confirm the node was skipped.

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
@tianleiwu tianleiwu changed the title [WIP] [ISSUE] Fix NvTensorRTRTXExecutionProvider capability claims Fix NvTensorRTRTXExecutionProvider capability claims Mar 12, 2026
@tianleiwu tianleiwu enabled auto-merge (squash) March 13, 2026 17:48
@chilo-ms
Copy link
Contributor

@ishwar-raut1 @umangb-09, FYI, with this change, the EPContext node should include "source" attribute for provider-bridge TRT RTX EP to run this EPContext node/model.
This could mean the previous EPContext model (no "source" node attribute) generated by Compile API might not be compatible now.

@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@tianleiwu tianleiwu merged commit 98a5495 into main Mar 16, 2026
89 checks passed
@tianleiwu tianleiwu deleted the copilot/fix-nvtensor-rtrtx-claim-issue branch March 16, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NvTensorRTRTXExecutionProvider::GetCapability claims EPContext nodes belonging to other EPs, causing crash on multi-GPU systems

4 participants