support batch embeddings and zero-copy numpy returns by kavorite · Pull Request #2077 · abetlen/llama-cpp-python

kavorite · 2025-10-12T05:30:58Z

Add n_seq_max parameter to Llama class to enable batch embeddings (defaults to 1 for backward compatibility)
Add return_numpy support to convert between numpy arrays and lists with zero copies
Update normalize_embedding() to keep numpy arrays as numpy arrays for zero-copy efficiency
Update test_embed_numpy to use n_seq_max=16 for batch embedding tests

Enables batch embedding support which was previously failing with llama_decode errors due to n_seq_max=1 limitation. This also fixes a bug in a repo I was working on that consumes this functionality to mass index GitHub repos for semantic multivector search on the machine under my desk (luh mao).

- Add n_seq_max parameter to `Llama` class to enable batch embeddings (defaults to 1 for backward compatibility) - Add `return_numpy` support to convert between numpy arrays and lists with zero copies - Update `normalize_embedding()` to keep numpy arrays as numpy arrays for zero-copy efficiency - Update `test_embed_numpy` to use `n_seq_max=16` for batch embedding tests Enables batch embedding support which was previously failing with llama_decode errors due to `n_seq_max=1` limitation.

@overload

Replace 'Any' with proper Union types and add @overload signatures to provide precise type hints based on input type (str vs List[str]), return_numpy flag, and return_count flag. This enables better IDE autocomplete and type checking for callers.

kavorite · 2025-10-12T05:38:54Z

this is addressed in #2058, which is cleaner. I think perhaps I will keep this locally until that is merged; if it is reopened it will address the numpy piece only, and hopefully with less bloat due to spuriously altered formatting

kavorite added 2 commits October 12, 2025 01:28

fix: proper numpy typing for embed

905c9ec

Replace 'Any' with proper Union types and add @overload signatures to provide precise type hints based on input type (str vs List[str]), return_numpy flag, and return_count flag. This enables better IDE autocomplete and type checking for callers.

kavorite closed this Oct 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support batch embeddings and zero-copy numpy returns#2077

support batch embeddings and zero-copy numpy returns#2077
kavorite wants to merge 2 commits intoabetlen:mainfrom
kavorite:batch-embed

kavorite commented Oct 12, 2025

Uh oh!

kavorite commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kavorite commented Oct 12, 2025

Uh oh!

kavorite commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant