Add Qwen3 family recipes by hanbitmyths · Pull Request #259 · microsoft/olive-recipes

hanbitmyths · 2026-03-14T00:57:55Z

This PR is to add recipes for Qwen3 family. 0.6B, 1.7B, 4B, 8B and 14B for CPU, CUDA, and WebGPU.

0.6B-8B: KLD Gradient quantization.
14B: k_quant_mixed quantization due to GPU memory limit.

… WebGPU - 0.6B-8B: kld_gradient SelectiveMixedPrecision + GPTQ + RTN + ModelBuilder (int4) - 14B: k_quant_mixed SelectiveMixedPrecision + GPTQ + RTN + ModelBuilder (int4) - All models include cpu, cuda, and webgpu execution provider configs - Standardized naming: {model}_{ep}_int4.json

Copilot

Pull request overview

Adds Olive recipe bundles for the Qwen3 model family across CPU, CUDA, and WebGPU execution providers, using INT4 quantization (KLD-gradient-based mixed precision for 0.6B–8B and k_quant_mixed for 14B due to memory constraints).

Changes:

Add per-model CPU/CUDA/WebGPU recipe configs (*.json) plus info.yaml, requirements.txt, and backend READMEs.
Introduce Qwen3 14B recipes using k_quant_mixed instead of kld_gradient.
Rename/standardize some CPU recipe references (e.g., removing _kld_gradient suffix for 0.6B/4B).

Reviewed changes

Copilot reviewed 60 out of 62 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
Qwen-Qwen3-0.6B/LICENSE	Add model license file.
Qwen-Qwen3-0.6B/cpu/info.yaml	Register CPU recipe metadata.
Qwen-Qwen3-0.6B/cpu/requirements.txt	Define CPU recipe Python deps.
Qwen-Qwen3-0.6B/cpu/README.md	Document CPU recipe usage.
Qwen-Qwen3-0.6B/cpu/Qwen-Qwen3-0.6B_cpu_int4.json	Add CPU INT4 recipe config.
Qwen-Qwen3-0.6B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-0.6B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-0.6B/cuda/README.md	Document CUDA recipe usage.
Qwen-Qwen3-0.6B/cuda/Qwen-Qwen3-0.6B_cuda_int4.json	Add CUDA INT4 recipe config.
Qwen-Qwen3-0.6B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-0.6B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-0.6B/webgpu/README.md	Document WebGPU recipe usage.
Qwen-Qwen3-0.6B/webgpu/Qwen-Qwen3-0.6B_webgpu_int4.json	Add WebGPU INT4 recipe config.
Qwen-Qwen3-1.7B/LICENSE	Add model license file.
Qwen-Qwen3-1.7B/cpu/info.yaml	Register CPU recipe metadata.
Qwen-Qwen3-1.7B/cpu/requirements.txt	Define CPU recipe Python deps.
Qwen-Qwen3-1.7B/cpu/README.md	Document CPU recipe usage.
Qwen-Qwen3-1.7B/cpu/Qwen-Qwen3-1.7B_cpu_int4.json	Add CPU INT4 recipe config.
Qwen-Qwen3-1.7B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-1.7B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-1.7B/cuda/README.md	Document CUDA recipe usage.
Qwen-Qwen3-1.7B/cuda/Qwen-Qwen3-1.7B_cuda_int4.json	Add CUDA INT4 recipe config.
Qwen-Qwen3-1.7B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-1.7B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-1.7B/webgpu/README.md	Document WebGPU recipe usage.
Qwen-Qwen3-1.7B/webgpu/Qwen-Qwen3-1.7B_webgpu_int4.json	Add WebGPU INT4 recipe config.
Qwen-Qwen3-4B/LICENSE	Add model license file.
Qwen-Qwen3-4B/cpu/info.yaml	Register CPU recipe metadata (rename/standardize).
Qwen-Qwen3-4B/cpu/README.md	Update CPU README to match recipe name/file.
Qwen-Qwen3-4B/cpu/Qwen-Qwen3-4B_cpu_int4.json	Add CPU INT4 recipe config.
Qwen-Qwen3-4B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-4B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-4B/webgpu/README.md	Document WebGPU recipe usage.
Qwen-Qwen3-4B/webgpu/Qwen-Qwen3-4B_webgpu_int4.json	Add WebGPU INT4 recipe config.
Qwen-Qwen3-4B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-4B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-4B/cuda/README.md	Document CUDA recipe usage.
Qwen-Qwen3-4B/cuda/Qwen-Qwen3-4B_cuda_int4.json	Add CUDA INT4 recipe config.
Qwen-Qwen3-8B/LICENSE	Add model license file.
Qwen-Qwen3-8B/cpu/info.yaml	Register CPU recipe metadata.
Qwen-Qwen3-8B/cpu/requirements.txt	Define CPU recipe Python deps.
Qwen-Qwen3-8B/cpu/README.md	Document CPU recipe usage.
Qwen-Qwen3-8B/cpu/Qwen-Qwen3-8B_cpu_int4.json	Add CPU INT4 recipe config.
Qwen-Qwen3-8B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-8B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-8B/cuda/README.md	Document CUDA recipe usage.
Qwen-Qwen3-8B/cuda/Qwen-Qwen3-8B_cuda_int4.json	Add CUDA INT4 recipe config.
Qwen-Qwen3-8B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-8B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-8B/webgpu/README.md	Document WebGPU recipe usage.
Qwen-Qwen3-8B/webgpu/Qwen-Qwen3-8B_webgpu_int4.json	Add WebGPU INT4 recipe config.
Qwen-Qwen3-14B/LICENSE	Add model license file.
Qwen-Qwen3-14B/cpu/info.yaml	Register CPU recipe metadata.
Qwen-Qwen3-14B/cpu/requirements.txt	Define CPU recipe Python deps.
Qwen-Qwen3-14B/cpu/README.md	Document CPU recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/cpu/Qwen-Qwen3-14B_cpu_int4.json	Add CPU INT4 recipe config (`k_quant_mixed`).
Qwen-Qwen3-14B/cuda/info.yaml	Register CUDA recipe metadata.
Qwen-Qwen3-14B/cuda/requirements.txt	Define CUDA recipe Python deps.
Qwen-Qwen3-14B/cuda/README.md	Document CUDA recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/cuda/Qwen-Qwen3-14B_cuda_int4.json	Add CUDA INT4 recipe config (`k_quant_mixed`).
Qwen-Qwen3-14B/webgpu/info.yaml	Register WebGPU recipe metadata.
Qwen-Qwen3-14B/webgpu/requirements.txt	Define WebGPU recipe Python deps.
Qwen-Qwen3-14B/webgpu/README.md	Document WebGPU recipe usage and 14B quantization rationale.
Qwen-Qwen3-14B/webgpu/Qwen-Qwen3-14B_webgpu_int4.json	Add WebGPU INT4 recipe config (`k_quant_mixed`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Qwen-Qwen3-14B/cpu/Qwen-Qwen3-14B_cpu_int4.json

+{
+    "input_model": {
+        "type": "HfModel",
+        "model_path": "Qwen/Qwen3-14B",
+        "load_kwargs": {
+            "torch_dtype": "float16"
+        }
+    },
+    "passes": {


Qwen-Qwen3-14B/cuda/README.md

+- This model uses `k_quant_mixed` via the `SelectiveMixedPrecision` pass followed by
+  `GPTQ` and `ModelBuilder`, instead of the `kld_gradient` algorithm used by smaller
+  Qwen3 models (0.6B–8B). The `kld_gradient` algorithm requires loading the full model


Qwen-Qwen3-1.7B/cpu/Qwen-Qwen3-1.7B_cpu_int4.json

+{
+    "input_model": {
+        "type": "HfModel",
+        "model_path": "Qwen/Qwen3-1.7B",
+        "load_kwargs": {
+            "torch_dtype": "float16"
+        }
+    },
+    "passes": {


Qwen-Qwen3-8B/cpu/Qwen-Qwen3-8B_cpu_int4.json

+{
+    "input_model": {
+        "type": "HfModel",
+        "model_path": "Qwen/Qwen3-8B",
+        "load_kwargs": {
+            "torch_dtype": "float16"
+        }
+    },
+    "passes": {


Qwen-Qwen3-14B/cpu/README.md

+  `GPTQ` and `ModelBuilder`, instead of the `kld_gradient` algorithm used by smaller
+  Qwen3 models (0.6B–8B). The `kld_gradient` algorithm requires loading the full model
+  to GPU for gradient-based sensitivity estimation, which exceeds the 80 GB per-GPU
+  memory limit for the 14B model. The `k_quant_mixed` algorithm uses a pre-defined
+  quantization sensitivity map and does not require GPU memory for sensitivity estimation.


Qwen-Qwen3-14B/webgpu/README.md

+- This model uses `k_quant_mixed` via the `SelectiveMixedPrecision` pass followed by
+  `GPTQ` and `ModelBuilder`, instead of the `kld_gradient` algorithm used by smaller
+  Qwen3 models (0.6B–8B). The `kld_gradient` algorithm requires loading the full model


Copilot AI review requested due to automatic review settings March 14, 2026 00:57

Merge branch 'main' into sunghcho/qwen3-family

7d7a204

Copilot started reviewing on behalf of hanbitmyths March 14, 2026 00:58 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3 family recipes#259

Add Qwen3 family recipes#259
hanbitmyths wants to merge 2 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-family

hanbitmyths commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hanbitmyths commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants