Skip to content

common : add --sched-n-copies parameter for pipeline parallelism configuration#20395

Open
mxxm-t wants to merge 1 commit intoggml-org:masterfrom
mxxm-t:feat/runtime-sched-copies
Open

common : add --sched-n-copies parameter for pipeline parallelism configuration#20395
mxxm-t wants to merge 1 commit intoggml-org:masterfrom
mxxm-t:feat/runtime-sched-copies

Conversation

@mxxm-t
Copy link

@mxxm-t mxxm-t commented Mar 11, 2026

Replace compile-time GGML_SCHED_MAX_COPIES with runtime configuration. Add --sched-n-copies parameter to control scheduler input copies (default: 4, max: 16). Implement ggml_backend_sched_set_n_copies() to override the number of copies used for parallel execution. Update llama-bench to support the new parameter.

Changing this parameter can give quiet a bit of performance.

Benchmarks
| model                          |       size |     params | backend    | ngl | sched_n_copies | n_ubatch | fa | mmap | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------------: | -------: | -: | ---: | --: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |          pp1024 |        641.02 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |          pp2048 |       1111.21 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |          pp8096 |       2053.24 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |         pp16384 |       2165.73 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |         pp32768 |       2112.82 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |         pp65536 |       1792.52 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |        pp131072 |       1316.19 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |          pp1024 |        636.88 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |          pp2048 |       1110.96 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |          pp8096 |       2336.29 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |         pp16384 |       2705.89 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |         pp32768 |       2705.33 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |         pp65536 |       2313.66 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |        pp131072 |       1698.53 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |          pp1024 |        641.25 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |          pp2048 |       1100.32 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |          pp8096 |       2355.55 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |         pp16384 |       2699.75 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |         pp32768 |       2706.86 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |         pp65536 |       2313.20 ± 0.00 |
| gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |        pp131072 |       1699.27 ± 0.00 |

@mxxm-t mxxm-t requested a review from ggerganov as a code owner March 11, 2026 12:30
@github-actions github-actions bot added examples ggml changes relating to the ggml tensor library for machine learning labels Mar 11, 2026
…iguration

Replace compile-time GGML_SCHED_MAX_COPIES with runtime configuration. Add --sched-n-copies parameter to control scheduler input copies (default: 4, max: 16). Implement ggml_backend_sched_set_n_copies() to override the number of copies used for parallel execution. Update llama-bench to support the new parameter.
@mxxm-t mxxm-t force-pushed the feat/runtime-sched-copies branch from 5663db2 to ad97a40 Compare March 12, 2026 08:36
@mxxm-t
Copy link
Author

mxxm-t commented Mar 12, 2026

More tests on 10x MI50 32GB:

Qwen3-4B-Instruct-2507-Q4_0

| model                          |       size |     params | backend    | ngl | sched_n_copies | n_ubatch | fa | mmap | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------------: | -------: | -: | ---: | --: | --------------: | -------------------: |
| qwen3 4B Q4_0                  |   2.21 GiB |     4.02 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |         pp16384 |       4055.10 ± 0.00 |
| qwen3 4B Q4_0                  |   2.21 GiB |     4.02 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |         pp16384 |       4956.52 ± 0.00 |
| qwen3 4B Q4_0                  |   2.21 GiB |     4.02 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |         pp16384 |       4951.32 ± 0.00 |

Qwen3-14B-Q8_0

| model                          |       size |     params | backend    | ngl | sched_n_copies | n_ubatch | fa | mmap | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------------: | -------: | -: | ---: | --: | --------------: | -------------------: |
| qwen3 14B Q8_0                 |  14.61 GiB |    14.77 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |         pp16384 |        952.61 ± 0.00 |
| qwen3 14B Q8_0                 |  14.61 GiB |    14.77 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |         pp16384 |       1226.22 ± 0.00 |
| qwen3 14B Q8_0                 |  14.61 GiB |    14.77 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |         pp16384 |       1224.99 ± 0.00 |

MiniMax-M2.1-GGUF_Q8_0

| model                          |       size |     params | backend    | ngl | sched_n_copies | n_ubatch | fa | mmap | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------------: | -------: | -: | ---: | --: | --------------: | -------------------: |
| minimax-m2 230B.A10B Q8_0      | 226.43 GiB |   228.69 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |         pp16384 |        654.39 ± 0.00 |
| minimax-m2 230B.A10B Q8_0      | 226.43 GiB |   228.69 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |         pp16384 |        881.94 ± 0.00 |
| minimax-m2 230B.A10B Q8_0      | 226.43 GiB |   228.69 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |         pp16384 |        878.99 ± 0.00 |

Qwen3.5-35B-A3B-UD-Q8_K_XL

| model                          |       size |     params | backend    | ngl | sched_n_copies | n_ubatch | fa | mmap | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------------: | -------: | -: | ---: | --: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q8_0         |  36.03 GiB |    34.66 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |         pp16384 |       1951.05 ± 0.00 |
| qwen35moe 35B.A3B Q8_0         |  36.03 GiB |    34.66 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |         pp16384 |       2518.78 ± 0.00 |
| qwen35moe 35B.A3B Q8_0         |  36.03 GiB |    34.66 B | ROCm       |  99 |             16 |     1024 |  1 |    0 |   1 |         pp16384 |       2514.52 ± 0.00 |

Qwen3.5-122B-A10B-UD-Q8_K_XL

| model                          |       size |     params | backend    | ngl | sched_n_copies | n_ubatch | fa | mmap | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------------: | -------: | -: | ---: | --: | --------------: | -------------------: |
| qwen35moe 122B.A10B Q8_0       | 159.10 GiB |   122.11 B | ROCm       |  99 |              4 |     1024 |  1 |    0 |   1 |         pp16384 |        209.00 ± 0.00 |
| qwen35moe 122B.A10B Q8_0       | 159.10 GiB |   122.11 B | ROCm       |  99 |              8 |     1024 |  1 |    0 |   1 |         pp16384 |        209.61 ± 0.00 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant