Add QDQFloatActivationsTransformer to remove activation Q→DQ pairs and enable MatMulNBits fusion#27636
Add QDQFloatActivationsTransformer to remove activation Q→DQ pairs and enable MatMulNBits fusion#27636
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “float activations” mode for QDQ models by introducing a dedicated Level2 transformer and a new session option to control behavior. This fits into the existing QDQ optimization pipeline by (a) preserving Q/DQ wrappers around data-movement ops and (b) removing remaining activation Q->DQ pairs after compute-op QDQ fusions.
Changes:
- Add
QDQFloatActivationsTransformerto remove eligible activation Q->DQ pairs and re-attempt MatMulNBits fusion after activation dequant removal. - Extend
QDQSelectorActionTransformerwith an option to skip data-movement QDQ rules when float-activations mode is enabled. - Introduce
session.qdq_float_activationsconfig key and add comprehensive unit tests covering key scenarios (simple removal, chained pairs, graph outputs, Conv fusion interplay, MatMulNBits enabling case).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/optimizer/qdq_float_activations_transformer_test.cc | New unit tests for float-activations mode behavior and interactions with existing fusions. |
| onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.h | Add constructor parameter to optionally skip data-movement QDQ rules. |
| onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc | Gate Split/DropQDQNodes rules behind the new “skip data movement” flag. |
| onnxruntime/core/optimizer/qdq_transformer/qdq_float_activations_transformer.h | New transformer interface and high-level behavior documentation. |
| onnxruntime/core/optimizer/qdq_transformer/qdq_float_activations_transformer.cc | Implementation of activation Q->DQ removal and post-removal MatMulNBits fusion attempt. |
| onnxruntime/core/optimizer/graph_transformer_utils.cc | Wire new session option into Level2 pipeline; pass through to QDQSelectorActionTransformer; add new transformer when enabled. |
| include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h | Add public session option key and explanatory comment for float-activations mode. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h
Show resolved
Hide resolved
onnxruntime/core/optimizer/qdq_transformer/qdq_float_activations_transformer.cc
Outdated
Show resolved
Hide resolved
onnxruntime/test/optimizer/qdq_float_activations_transformer_test.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/optimizer/qdq_transformer/qdq_float_activations_transformer.h
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
onnxruntime/core/optimizer/qdq_transformer/qdq_float_activations_transformer.cc
Outdated
Show resolved
Hide resolved
e827aa1 to
903db7d
Compare
1d35391 to
52bf578
Compare
Description
Adds a new Level 2 graph transformer (
QDQFloatActivationsTransformer) that removes activation Q→DQ pairs from fully quantized (QDQ) models, allowing unfused ops to run in float precision. This is gated behind thesession.qdq_float_activationssession option.Motivation
In fully QDQ models, after Level 1
QDQSelectorActionTransformerfuses compute ops (e.g., Conv→QLinearConv), leftover activation Q→DQ pairs remain around ops that don't have QDQ fusions. These pairs add unnecessary quantize/dequantize overhead. Additionally, theDQMatMulToMatMulNBitsfusion at Level 1 requires exactly 1 DQ input to MatMul — when an activation Q→DQ pair is present, the MatMul sees 2 DQ inputs and the fusion is rejected.Changes
New transformer (
qdq_float_activations_transformer.h/cc):DQMatMulToMatMulNBitsandDQCastMatMulToMatMulNBitsfusions on newly eligible patterns.Session option (
session.qdq_float_activations): When set to"1", enables the transformer and also skipsDropQDQNodesRules/SplitQDQRulesinQDQSelectorActionTransformerso data-movement ops keep their Q/DQ wrappers adjacent.Pipeline integration (
graph_transformer_utils.cc): Registered at Level 2, ordered afterMatMulNBitsFusionand beforeQDQFinalCleanupTransformer.