Skip to content

Add Qwen3 family recipes#259

Open
hanbitmyths wants to merge 2 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-family
Open

Add Qwen3 family recipes#259
hanbitmyths wants to merge 2 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-family

Conversation

@hanbitmyths
Copy link

This PR is to add recipes for Qwen3 family. 0.6B, 1.7B, 4B, 8B and 14B for CPU, CUDA, and WebGPU.

  • 0.6B-8B: KLD Gradient quantization.
  • 14B: k_quant_mixed quantization due to GPU memory limit.

… WebGPU

- 0.6B-8B: kld_gradient SelectiveMixedPrecision + GPTQ + RTN + ModelBuilder (int4)
- 14B: k_quant_mixed SelectiveMixedPrecision + GPTQ + RTN + ModelBuilder (int4)
- All models include cpu, cuda, and webgpu execution provider configs
- Standardized naming: {model}_{ep}_int4.json
Copilot AI review requested due to automatic review settings March 14, 2026 00:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Olive recipe bundles for the Qwen3 model family across CPU, CUDA, and WebGPU execution providers, using INT4 quantization (KLD-gradient-based mixed precision for 0.6B–8B and k_quant_mixed for 14B due to memory constraints).

Changes:

  • Add per-model CPU/CUDA/WebGPU recipe configs (*.json) plus info.yaml, requirements.txt, and backend READMEs.
  • Introduce Qwen3 14B recipes using k_quant_mixed instead of kld_gradient.
  • Rename/standardize some CPU recipe references (e.g., removing _kld_gradient suffix for 0.6B/4B).

Reviewed changes

Copilot reviewed 60 out of 62 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
Qwen-Qwen3-0.6B/LICENSE Add model license file.
Qwen-Qwen3-0.6B/cpu/info.yaml Register CPU recipe metadata.
Qwen-Qwen3-0.6B/cpu/requirements.txt Define CPU recipe Python deps.
Qwen-Qwen3-0.6B/cpu/README.md Document CPU recipe usage.
Qwen-Qwen3-0.6B/cpu/Qwen-Qwen3-0.6B_cpu_int4.json Add CPU INT4 recipe config.
Qwen-Qwen3-0.6B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-0.6B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-0.6B/cuda/README.md Document CUDA recipe usage.
Qwen-Qwen3-0.6B/cuda/Qwen-Qwen3-0.6B_cuda_int4.json Add CUDA INT4 recipe config.
Qwen-Qwen3-0.6B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-0.6B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-0.6B/webgpu/README.md Document WebGPU recipe usage.
Qwen-Qwen3-0.6B/webgpu/Qwen-Qwen3-0.6B_webgpu_int4.json Add WebGPU INT4 recipe config.
Qwen-Qwen3-1.7B/LICENSE Add model license file.
Qwen-Qwen3-1.7B/cpu/info.yaml Register CPU recipe metadata.
Qwen-Qwen3-1.7B/cpu/requirements.txt Define CPU recipe Python deps.
Qwen-Qwen3-1.7B/cpu/README.md Document CPU recipe usage.
Qwen-Qwen3-1.7B/cpu/Qwen-Qwen3-1.7B_cpu_int4.json Add CPU INT4 recipe config.
Qwen-Qwen3-1.7B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-1.7B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-1.7B/cuda/README.md Document CUDA recipe usage.
Qwen-Qwen3-1.7B/cuda/Qwen-Qwen3-1.7B_cuda_int4.json Add CUDA INT4 recipe config.
Qwen-Qwen3-1.7B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-1.7B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-1.7B/webgpu/README.md Document WebGPU recipe usage.
Qwen-Qwen3-1.7B/webgpu/Qwen-Qwen3-1.7B_webgpu_int4.json Add WebGPU INT4 recipe config.
Qwen-Qwen3-4B/LICENSE Add model license file.
Qwen-Qwen3-4B/cpu/info.yaml Register CPU recipe metadata (rename/standardize).
Qwen-Qwen3-4B/cpu/README.md Update CPU README to match recipe name/file.
Qwen-Qwen3-4B/cpu/Qwen-Qwen3-4B_cpu_int4.json Add CPU INT4 recipe config.
Qwen-Qwen3-4B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-4B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-4B/webgpu/README.md Document WebGPU recipe usage.
Qwen-Qwen3-4B/webgpu/Qwen-Qwen3-4B_webgpu_int4.json Add WebGPU INT4 recipe config.
Qwen-Qwen3-4B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-4B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-4B/cuda/README.md Document CUDA recipe usage.
Qwen-Qwen3-4B/cuda/Qwen-Qwen3-4B_cuda_int4.json Add CUDA INT4 recipe config.
Qwen-Qwen3-8B/LICENSE Add model license file.
Qwen-Qwen3-8B/cpu/info.yaml Register CPU recipe metadata.
Qwen-Qwen3-8B/cpu/requirements.txt Define CPU recipe Python deps.
Qwen-Qwen3-8B/cpu/README.md Document CPU recipe usage.
Qwen-Qwen3-8B/cpu/Qwen-Qwen3-8B_cpu_int4.json Add CPU INT4 recipe config.
Qwen-Qwen3-8B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-8B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-8B/cuda/README.md Document CUDA recipe usage.
Qwen-Qwen3-8B/cuda/Qwen-Qwen3-8B_cuda_int4.json Add CUDA INT4 recipe config.
Qwen-Qwen3-8B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-8B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-8B/webgpu/README.md Document WebGPU recipe usage.
Qwen-Qwen3-8B/webgpu/Qwen-Qwen3-8B_webgpu_int4.json Add WebGPU INT4 recipe config.
Qwen-Qwen3-14B/LICENSE Add model license file.
Qwen-Qwen3-14B/cpu/info.yaml Register CPU recipe metadata.
Qwen-Qwen3-14B/cpu/requirements.txt Define CPU recipe Python deps.
Qwen-Qwen3-14B/cpu/README.md Document CPU recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/cpu/Qwen-Qwen3-14B_cpu_int4.json Add CPU INT4 recipe config (k_quant_mixed).
Qwen-Qwen3-14B/cuda/info.yaml Register CUDA recipe metadata.
Qwen-Qwen3-14B/cuda/requirements.txt Define CUDA recipe Python deps.
Qwen-Qwen3-14B/cuda/README.md Document CUDA recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/cuda/Qwen-Qwen3-14B_cuda_int4.json Add CUDA INT4 recipe config (k_quant_mixed).
Qwen-Qwen3-14B/webgpu/info.yaml Register WebGPU recipe metadata.
Qwen-Qwen3-14B/webgpu/requirements.txt Define WebGPU recipe Python deps.
Qwen-Qwen3-14B/webgpu/README.md Document WebGPU recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/webgpu/Qwen-Qwen3-14B_webgpu_int4.json Add WebGPU INT4 recipe config (k_quant_mixed).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +9
{
"input_model": {
"type": "HfModel",
"model_path": "Qwen/Qwen3-14B",
"load_kwargs": {
"torch_dtype": "float16"
}
},
"passes": {
Comment on lines +21 to +23
- This model uses `k_quant_mixed` via the `SelectiveMixedPrecision` pass followed by
`GPTQ` and `ModelBuilder`, instead of the `kld_gradient` algorithm used by smaller
Qwen3 models (0.6B–8B). The `kld_gradient` algorithm requires loading the full model
Comment on lines +1 to +9
{
"input_model": {
"type": "HfModel",
"model_path": "Qwen/Qwen3-1.7B",
"load_kwargs": {
"torch_dtype": "float16"
}
},
"passes": {
Comment on lines +1 to +9
{
"input_model": {
"type": "HfModel",
"model_path": "Qwen/Qwen3-8B",
"load_kwargs": {
"torch_dtype": "float16"
}
},
"passes": {
Comment on lines +22 to +26
`GPTQ` and `ModelBuilder`, instead of the `kld_gradient` algorithm used by smaller
Qwen3 models (0.6B–8B). The `kld_gradient` algorithm requires loading the full model
to GPU for gradient-based sensitivity estimation, which exceeds the 80 GB per-GPU
memory limit for the 14B model. The `k_quant_mixed` algorithm uses a pre-defined
quantization sensitivity map and does not require GPU memory for sensitivity estimation.
Comment on lines +21 to +23
- This model uses `k_quant_mixed` via the `SelectiveMixedPrecision` pass followed by
`GPTQ` and `ModelBuilder`, instead of the `kld_gradient` algorithm used by smaller
Qwen3 models (0.6B–8B). The `kld_gradient` algorithm requires loading the full model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants