Fix NvTensorRTRTXExecutionProvider capability claims#27631
Conversation
… EPs - Add SOURCE constant to both EPs' onnx_ctx_model_helper.h - Add source attribute check in NvTensorRTRTX GetCapability() and GraphHasCtxNode() - Add source attribute check in classic TensorRT GraphHasCtxNode() - Write source attribute when creating EPContext nodes in both EPs Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
- NvTensorRTRTX EP: Test that foreign EPContext nodes (OpenVINO, classic TRT source) are skipped - Classic TensorRT EP: Test that foreign EPContext nodes (OpenVINO, NvTensorRTRTX source) are skipped Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
…ut source attribute Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
tianleiwu
left a comment
There was a problem hiding this comment.
LGTM with minor suggestions:
-
Consider deduplicating
CreateSyntheticEPContextModel.
The same helper was added twice with minor path-type differences, once inonnxruntime/test/providers/nv_tensorrt_rtx/nv_ep_context_test.cc:207and once inonnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc:1415. This is test-only duplication, so low priority. -
Keep the classic TensorRT scope note in mind.
Thesourcefilter there is wired throughGraphHasCtxNode(), and the mainGetCapability()entry point only uses that helper for the single-nodeEPContextfast path atonnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:2593. That matches current EPContext usage, so I do not view it as a blocker for this PR, but if multi-node or mixed-form context graphs become a supported scenario later, the filter will need to move deeper into the general claim path as well.
…essage The error when a foreign-source EPContext node is not claimed comes from TransformerMemcpyImpl, which mentions the EPContext node name but not the foreign source EP name. Relax the assertions to only check for 'EPContext' in the error message, which is sufficient to confirm the node was skipped. Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
|
@ishwar-raut1 @umangb-09, FYI, with this change, the EPContext node should include "source" attribute for provider-bridge TRT RTX EP to run this EPContext node/model. |
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
EPContextNode_ForeignSourceSkippedassertion expects "OpenVINOExecutionProvider" in error but actual error from TransformerMemcpyImpl doesn't include ittensorrt_basic_test.cc: Remove overly-specific "OpenVINOExecutionProvider" assertion fromEPContextNode_ForeignSourceSkippednv_ep_context_test.cc: Remove same overly-specific assertion fromEPContextNode_ForeignSourceSkipped(proactive)Original prompt
This section details on the original issue you should resolve
<issue_title>NvTensorRTRTXExecutionProvider::GetCapability claims EPContext nodes belonging to other EPs, causing crash on multi-GPU systems</issue_title>
<issue_description>### Describe the issue
On multi-GPU systems where both
OpenVINOExecutionProviderandNvTensorRTRTXExecutionProviderare registered,loading an EPContext model produced by OpenVINO causes an access violation (0xC0000005) or
"Could not find an implementation for EPContext(1)" error.
The root cause is that
NvExecutionProvider::GetCapability()innv_execution_provider.ccclaims allEPContextnodes without checking thesourceattribute:The
EPContextcontrib op schema defines an optionalsourceattribute specifically for EP identification(
contrib_defs.cc). Other EPs already check this attribute:source == kOpenVINOExecutionProviderinEPCtxHandler::CheckForOVEPCtxNode()cache_source == "qnnexecutionprovider" || cache_source == "qnn"inPartitionCtxModel()The NvTensorRTRTX EP neither checks
sourcewhen claiming EPContext nodes inGetCapability(),nor writes
sourcewhen creating EPContext nodes inCreateCtxNode().Proposed fix
Add a
sourceattribute check toNvExecutionProvider::GetCapability()before claiming EPContext nodes:This requires adding
static const std::string SOURCE = "source";toonnx_ctx_model_helper.h(matching the existing constant in QNN EP's
builder/onnx_ctx_model_helper.hand OpenVINO EP'sonnx_ctx_model_helper.h).Additionally,
CreateCtxNode()inonnx_ctx_model_helper.ccshould be updated to write thesourceattribute (set tokNvTensorRTRTXExecutionProvider) when producing EPContext models,following the same pattern as OpenVINO EP's
AddOVEPCtxNodeToGraph(). This ensures NvTensorRTRTXEPContext models are properly tagged for the future.
Urgency
This is a P1 blocker for MLCommons MLPerf Client v1.6 testing on multi-GPU laptop systems
(Intel iGPU + NVIDIA dGPU). See: https://github.com/mlcommons/mlperf_client_dev/issues/976
To reproduce
System: Any system with both an Intel GPU (OpenVINO EP) and NVIDIA GPU (NvTensorRTRTX EP)
source=OpenVINOExecutionProvider(e.g., Phi-3.5 compiled by OpenVINO)PREFER_GPU) or manual multi-EP orderingExpected: OpenVINO EP claims its own EPContext node; NvTensorRTRTX EP skips it
Actual: NvTensorRTRTX EP claims the OpenVINO EPContext node → access violation crash (0xC0000005)
A self-contained min-repro (C++ source + Python model generator) is available. The repro uses
RegisterExecutionProviderLibraryto register both EP DLLs, creates a synthetic EPContext modelwith
source=OpenVINOExecutionProvider, and demonstrates the crash.Tested fix
We have built and tested the fix described above against:
source=OpenVINOExecutionProvideropenvino_model_dy.onnx)Both pass with the fix applied: NvTensorRTRTX EP correctly skips the OpenVINO EPContext node
(
"[NvTensorRTRTX EP] No graph will run on Nv execution provider"), and OpenVINO EP successfullycreates the session.
The classic
TensorRTExecutionProviderhas the same issue (tensorrt_execution_provider.ccline 2593)but is less commonly deployed alongside other EPs.
Environment
</issue_description>
<agent_instructions>Please provide a fix and add unit test.</agent_instructions>
Comments on the Issue (you are @copilot in this section)
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.