Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
0fe0dab
Initial plan
Copilot Jan 13, 2026
bb5674c
Add KV Cache Q8_0 quantization support for reduced memory usage
Copilot Jan 13, 2026
b3d0e2d
Improve KV cache Q8_0 documentation and help text clarity
Copilot Jan 13, 2026
512a42b
Revert Q8_0 KV cache code changes; document block alignment issue and…
Copilot Jan 13, 2026
3429b9a
Add practical KV cache optimization strategies section with implement…
Copilot Jan 13, 2026
32d9709
Implement separate K/V cache type configuration for mixed precision
Copilot Jan 13, 2026
317b980
Implement KV cache quantization with Q8_0, Q4_0, Q5_0 support using g…
Copilot Jan 14, 2026
93ebaef
Fix comment about block size determination
Copilot Jan 14, 2026
f653dff
Add quantized KV cache performance analysis and optimization recommen…
Copilot Jan 14, 2026
db20ad3
Add validation: quantized KV cache types require flash attention enabled
Copilot Jan 14, 2026
294c6c5
Document three KV cache types (kv_self, kv_cross, kv_pad) with usage …
Copilot Jan 14, 2026
610f5f8
Implement separate KV cache types for kv_self, kv_cross, and kv_pad
Copilot Jan 14, 2026
96da180
Fix non-flash attention path V tensor strides to use n_ctx-based layout
Copilot Jan 14, 2026
d53bf50
Allow K cache quantization with non-flash attention, V quantization r…
Copilot Jan 14, 2026
a94c7a5
Improve comments explaining V quantization constraints in non-flash a…
Copilot Jan 14, 2026
d8b99c9
Implement ggml-level Q8_0 KV cache optimization: add ggml_vec_mad_q8_0
Copilot Jan 14, 2026
f62a80b
simd
lhpqaq Jan 14, 2026
9833290
Add ggml_vec_mad_q4_0 with SIMD optimization (NEON + AVX2) and integr…
Copilot Jan 14, 2026
9db0e6b
Code review fixes: use named constants instead of magic numbers in gg…
Copilot Jan 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading