Skip to content

llama-bench: fix case where mmap and direct-io are turned on together#20461

Open
taronaeo wants to merge 2 commits intoggml-org:masterfrom
taronaeo:fix/bench-mmap-dio-flags
Open

llama-bench: fix case where mmap and direct-io are turned on together#20461
taronaeo wants to merge 2 commits intoggml-org:masterfrom
taronaeo:fix/bench-mmap-dio-flags

Conversation

@taronaeo
Copy link
Collaborator

Ref: #20211 (comment)

As mentioned in the thread referenced, users may inadvertently enable both --mmap and --direct-io together and notice no performance difference.

This PR adds additional safeguards during flag parsing to ensure that both --mmap and --direct-io cannot have the same enabled state. Also, moved the ggml_backend_load_all call just after the flag parsing function for llama-bench to fail early if any of the flags has an error.

Test Cases

  1. --mmap is enabled by default but the user specifies --direct-io 1
$ build/bin/llama-bench -hf ibm-granite/granite-3.3-2b-instruct-GGUF:Q4_K_M --direct-io 1 

...truncated...
common_download_file_single_online: using cached file (same etag): /Users/taronaeo/Library/Caches/llama.cpp/ibm-granite_granite-3.3-2b-instruct-GGUF_granite-3.3-2b-instruct-Q4_K_M.gguf
| model                          |       size |     params | backend    | threads | mmap | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ---: | --: | --------------: | -------------------: |
| granite 3B Q4_K - Medium       |   1.44 GiB |     2.53 B | MTL,BLAS   |       8 |    0 |   1 |           pp512 |       633.47 ± 44.96 |
| granite 3B Q4_K - Medium       |   1.44 GiB |     2.53 B | MTL,BLAS   |       8 |    0 |   1 |           tg128 |         59.78 ± 0.94 |

build: a320bed33 (8323)
  1. --mmap and --direct-io arguments do not have the same number of values
$ build/bin/llama-bench -hf ibm-granite/granite-3.3-2b-instruct-GGUF:Q4_K_M --mmap 1,0 --direct-io 1

...truncated...
common_download_file_single_online: using cached file (same etag): /Users/taronaeo/Library/Caches/llama.cpp/ibm-granite_granite-3.3-2b-instruct-GGUF_granite-3.3-2b-instruct-Q4_K_M.gguf
error: --mmap and --direct-io must have the same number of values
  1. --mmap and --direct-io are turned on together
$ build/bin/llama-bench -hf ibm-granite/granite-3.3-2b-instruct-GGUF:Q4_K_M --mmap 1,0 --direct-io 1,0

...truncated...
common_download_file_single_online: using cached file (same etag): /Users/taronaeo/Library/Caches/llama.cpp/ibm-granite_granite-3.3-2b-instruct-GGUF_granite-3.3-2b-instruct-Q4_K_M.gguf
error: --direct-io cannot be enabled with --mmap
  1. Valid benchmark run
$ build/bin/llama-bench -hf ibm-granite/granite-3.3-2b-instruct-GGUF:Q4_K_M --mmap 1,0 --direct-io 0,1

...truncated...
common_download_file_single_online: using cached file (same etag): /Users/taronaeo/Library/Caches/llama.cpp/ibm-granite_granite-3.3-2b-instruct-GGUF_granite-3.3-2b-instruct-Q4_K_M.gguf
| model                          |       size |     params | backend    | threads | mmap | dio |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ---: | --: | --------------: | -------------------: |
| granite 3B Q4_K - Medium       |   1.44 GiB |     2.53 B | MTL,BLAS   |       8 |    1 |   0 |           pp512 |       652.69 ± 12.18 |
| granite 3B Q4_K - Medium       |   1.44 GiB |     2.53 B | MTL,BLAS   |       8 |    1 |   0 |           tg128 |         59.32 ± 0.24 |
| granite 3B Q4_K - Medium       |   1.44 GiB |     2.53 B | MTL,BLAS   |       8 |    0 |   0 |           pp512 |        659.92 ± 5.79 |
| granite 3B Q4_K - Medium       |   1.44 GiB |     2.53 B | MTL,BLAS   |       8 |    0 |   0 |           tg128 |         60.16 ± 0.13 |
| granite 3B Q4_K - Medium       |   1.44 GiB |     2.53 B | MTL,BLAS   |       8 |    0 |   1 |           pp512 |        663.38 ± 0.52 |
| granite 3B Q4_K - Medium       |   1.44 GiB |     2.53 B | MTL,BLAS   |       8 |    0 |   1 |           tg128 |         60.15 ± 0.10 |

build: a320bed33 (8323)

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant