Skip to content

Add QDQFloatActivationsTransformer to remove activation Q→DQ pairs and enable MatMulNBits fusion#27636

Draft
jambayk wants to merge 5 commits intomainfrom
jambayk/qdq-opt
Draft

Add QDQFloatActivationsTransformer to remove activation Q→DQ pairs and enable MatMulNBits fusion#27636
jambayk wants to merge 5 commits intomainfrom
jambayk/qdq-opt

Conversation

@jambayk
Copy link
Contributor

@jambayk jambayk commented Mar 12, 2026

Description

Adds a new Level 2 graph transformer (QDQFloatActivationsTransformer) that removes activation Q→DQ pairs from fully quantized (QDQ) models, allowing unfused ops to run in float precision. This is gated behind the session.qdq_float_activations session option.

Motivation

In fully QDQ models, after Level 1 QDQSelectorActionTransformer fuses compute ops (e.g., Conv→QLinearConv), leftover activation Q→DQ pairs remain around ops that don't have QDQ fusions. These pairs add unnecessary quantize/dequantize overhead. Additionally, the DQMatMulToMatMulNBits fusion at Level 1 requires exactly 1 DQ input to MatMul — when an activation Q→DQ pair is present, the MatMul sees 2 DQ inputs and the fusion is rejected.

Changes

  • New transformer (qdq_float_activations_transformer.h/cc):

    • Sub-pass A: Removes all adjacent Q→DQ pairs where Q and DQ share matching scale/zero-point. Handles multiple DQ consumers per Q node, and DQ nodes producing graph outputs (via Identity rewiring).
    • Sub-pass B: After Q→DQ removal, re-scans MatMul nodes and applies DQMatMulToMatMulNBits and DQCastMatMulToMatMulNBits fusions on newly eligible patterns.
  • Session option (session.qdq_float_activations): When set to "1", enables the transformer and also skips DropQDQNodesRules/SplitQDQRules in QDQSelectorActionTransformer so data-movement ops keep their Q/DQ wrappers adjacent.

  • Pipeline integration (graph_transformer_utils.cc): Registered at Level 2, ordered after MatMulNBitsFusion and before QDQFinalCleanupTransformer.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “float activations” mode for QDQ models by introducing a dedicated Level2 transformer and a new session option to control behavior. This fits into the existing QDQ optimization pipeline by (a) preserving Q/DQ wrappers around data-movement ops and (b) removing remaining activation Q->DQ pairs after compute-op QDQ fusions.

Changes:

  • Add QDQFloatActivationsTransformer to remove eligible activation Q->DQ pairs and re-attempt MatMulNBits fusion after activation dequant removal.
  • Extend QDQSelectorActionTransformer with an option to skip data-movement QDQ rules when float-activations mode is enabled.
  • Introduce session.qdq_float_activations config key and add comprehensive unit tests covering key scenarios (simple removal, chained pairs, graph outputs, Conv fusion interplay, MatMulNBits enabling case).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
onnxruntime/test/optimizer/qdq_float_activations_transformer_test.cc New unit tests for float-activations mode behavior and interactions with existing fusions.
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.h Add constructor parameter to optionally skip data-movement QDQ rules.
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc Gate Split/DropQDQNodes rules behind the new “skip data movement” flag.
onnxruntime/core/optimizer/qdq_transformer/qdq_float_activations_transformer.h New transformer interface and high-level behavior documentation.
onnxruntime/core/optimizer/qdq_transformer/qdq_float_activations_transformer.cc Implementation of activation Q->DQ removal and post-removal MatMulNBits fusion attempt.
onnxruntime/core/optimizer/graph_transformer_utils.cc Wire new session option into Level2 pipeline; pass through to QDQSelectorActionTransformer; add new transformer when enabled.
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h Add public session option key and explanatory comment for float-activations mode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@jambayk jambayk force-pushed the jambayk/qdq-opt branch 2 times, most recently from e827aa1 to 903db7d Compare March 13, 2026 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants