JVector comes with the following sample programs to try:
A simple benchmark for the sift dataset located in the siftsmall directory in the project root.
mvn compile exec:exec@sift
Performs grid search across the GraphIndexBuilder parameter space to find
the best tradeoffs between recall and throughput.
This benchmark requires datasets from https://github.com/erikbern/ann-benchmarks to be downloaded to hdf5 and fvec
directories hdf5 or fvec under the project root depending on the dataset format.
You can use plot_output.py to graph the pareto-optimal points found by Bench.
mvn compile exec:exec@bench
Some sample KNN datasets for testing based on ada-002 embeddings generated on wikipedia data are available in ivec/fvec format for testing at:
aws s3 ls s3://astra-vector/wikipedia_squad/ --no-sign-request
PRE 100k/
PRE 1M/
PRE 4M/
Bench automatically downloads the 100k dataset to the ./fvec directory .
To run SiftSmall/Bench without the JVM vector module available, you can use the following invocations:
mvn -Pjdk11 compile exec:exec@bench
mvn -Pjdk11 compile exec:exec@sift
A simple service for adding / querying vectors over a unix socket.
Install socat using homebrew on mac or apt/rpm on linux
Mac:
brew install socat
Linux:
apt-get install socat
Start the service with:
mvn compile exec:exec@ipcserve
Now you can interact with the service
socat - unix-client:/tmp/jvector.sock
CREATE 3 DOT_PRODUCT 2 20
OK
WRITE [0.1,0.15,0.3]
OK
WRITE [0.2,0.83,0.05]
OK
WRITE [0.5,0.5,0.5]
OK
OPTIMIZE
OK
SEARCH 20 3 [0.15,0.1,0.1]
RESULT [2,1,0]All commands are completed with \n.
No spaces are allowed inside vector brackets.
-
CREATE {dimensions} {similarity-function} {M} {searchDepthConstruction}- Creates a new index for this session
-
WRITE [N,N,N] ... [N,N,N]- Add one or more vectors to the index
-
OPTIMIZE- Call when indexing is complete
-
MEMORY- Get the in memory size of index
-
SEARCH {overquerySearch} {top-k} [N,N,N] ... [N,N,N]- Search index for the top-k closest vectors (ordinals of indexed values returned per query)
-
BULKLOAD {localpath}- Bulk loads a local file in numpy format Rows x Columns