Skip to content

COPY OF 2371#2378

Closed
xiaoyu-work wants to merge 35 commits intomainfrom
qwen-vl-dynamic-shape
Closed

COPY OF 2371#2378
xiaoyu-work wants to merge 35 commits intomainfrom
qwen-vl-dynamic-shape

Conversation

@xiaoyu-work
Copy link
Copy Markdown
Collaborator

Describe your changes

Checklist before requesting a review

  • Add unit tests for this change.
  • Make sure all tests can pass.
  • Update documents if necessary.
  • Lint and apply fixes to your code by running lintrunner -a
  • Is this a user-facing change? If yes, give a description of this change to be included in the release notes.

(Optional) Issue link

hanbitmyths and others added 30 commits February 26, 2026 11:19
- graph_surgeries.py: add QwenVL-specific graph surgery passes for
  vision embedding merge and positional encoding fixup
- rtn_quantization.py: extend RTN quantization for multimodal models,
  handle vision encoder exclusion patterns
- cast_chain_elimination.py: new pass to eliminate redundant Cast chains
  in Dynamo-exported models (fp32->fp16->fp32 patterns)
- olive_config.json: register new passes
…surgery passes

- rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather
- common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import
- graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes
- Apply ruff format to 4 files (cast_chain_elimination.py,
  rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py)
- Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with
  axis-aware _pack_int4_along_axis that correctly packs zero_point when
  k_blocks is small (e.g. 1), avoiding ValueError on reshape
- Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized
  always uses quantize_axis=data_rank-1, not pass_config['axis']
The upstream tuning_strategies.md page no longer exists, causing the
Sphinx linkcheck to fail with -W (warnings-as-errors).
Address PR review feedback from @devang-ml and @justinchuby: use
onnxscript.optimizer.optimize() instead of ORT InferenceSession with
session.enable_cast_chain_elimination to eliminate redundant Cast chains.

- Remove onnxruntime dependency from cast_chain_elimination pass
- Use onnxscript.optimizer.optimize() with TypeInferenceError fallback
  (same pattern as OnnxPeepholeOptimizer)
- Update test comment to reflect onnxscript optimizer
- Verified: numerically identical outputs (0.00 max abs diff)
- Verified: no eval regression (69% on AI2D 100 samples)
Resolve conflict in olive/passes/onnx/common.py: take upstream fix
from PR #2355 (ByteSize EncodeError handling).
…n elimination

Use a custom CastCastRoundTrip rewrite rule instead of the full
onnxscript.optimizer.optimize() call. The rewrite rule specifically
targets round-trip Cast chains (e.g. fp32->fp16->fp32) by checking
that the final cast type matches the original input type, and replaces
them with Identity.

This is simpler, faster, and avoids the TypeInferenceError fallback
that was needed with the full optimizer. The onnxscript rewrite()
function also runs RemoveUnusedNodesPass and RemoveUnusedOpsetsPass
automatically.

Validated: weights identical, 0.00 max abs diff, eval 69% unchanged.
Move _ensure_com_microsoft_opset and eliminate_cast_chains into
ModelOptimizer class. Add fix_com_microsoft_opset and
cast_chain_elimination config flags to OnnxPeepholeOptimizer.

Remove standalone OnnxCastChainElimination pass, its olive_config
entry, and its test file. Move tests into test_peephole_optimizer.py.

Per devang-ml's review: consolidate into existing pass to avoid
introducing a new one.
Add onnxscript_optimize, onnxoptimizer_optimize, and
fuse_reshape_operations config flags (default True for backward
compatibility). This allows recipe configs to disable the default
optimizations and only run opset fixup + cast chain elimination,
producing byte-identical models to the old standalone pass.
CI uses an ORT version that supports max IR version 11, but newer
ONNX packages default to IR version 13. Pin to 10 to match the
convention used by existing tests.
… assert

- GemmToMatMulAdd: create new transposed initializer instead of
  mutating shared one in-place; use base_name fallback for empty
  node.name to avoid duplicate tensor names.
- ReciprocalMulToDiv: build consumer map upfront to avoid O(N^2)
  graph scans; re-check actual inputs for stale consumer references.
- test_rtn_quantization: add found assertion in
  test_gather_quantize_axis_forced_to_last_dim.

Validated: 0.00 max abs diff, eval 69% unchanged.
@xiaoyu-work xiaoyu-work closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants