Open-source vector database designed for simplicity and speed, with flexible deployment options.
For embedded use as a library in Java projects, see:
https://github.com/tutikka/LibVectorDB
Features
- Create and manage indexes
- Create and manage vector embeddings in indexes
- Search for embeddings in indexes based on distance
- Cosine distance
- Euclid distance
- Manhattan distance
APIs
- RESTful HTTP API using JSON
Deployment
- Standalone Java application
- Docker
None
- Java 21 or later (tested using
21.0.10-zulu) - Git client
- Maven
- Docker
Clone this repository:
git clone https://github.com/tutikka/VectorDB.gitChange to cloned folder:
cd VectorDBClean, compile and package using Maven:
mvnw packageChange to created target directory:
cd targetStart the application:
java -jar vectordb-0.0.1-SNAPSHOT.jarOr build a Docker image and run it, for example:
docker build -t vectordb/vectordb .
docker run -p 8080:8080 vectordb/vectordbThe application will look for a configuration file named vectordb.properties in the root directory of the application during startup. If the file is not found, default values (shown in the example below) will be used.
#
# directory for data files (default = 'data')
#
data.directory = data
#
# maximum number of vectors per index (default = 65536)
#
data.max_vectors_per_index = 65536- Create index with
3 dimensionsand similarity based onmanhattan distance - Create entries into the index with random values as embeddings
- Search for the best matching entry based on given embedding
- Clean up and delete index
This example maps the positions of the planets in our solar system on 1.1.2025 to a 3D space using the sun as the origin, and then tests which planets are closest.
- Create index with
3dimensions (X, Y and Z coordinates) and similarity based oneuclid distance - Create entries to the index for each planet based on the position at 1.1.2025
- Search for the 3 closest planets to the sun
- Clean up and delete index
This example is closer to a real-world scenario, where we have documents that we want to index to perform queries based on similarity, and then summarize best results based on a user's question.
- Create index with
1536dimensions (from OpenAIada-002text embedding model) and similarity based oncosine distance - Create entries into the index by embedding each document using the OpenAI
ada-002text embedding model - Search for the best matching entry based on the user's question (embedded with the same model)
- Retrieve the original document identifier from the search results
- Use a chat completion model (OpenAI
gpt-5) to summarize the retrieved document based on the user's original question
Note! Make sure to add a .env file in the same directory with your OpenAI API Key
| Method | URI | Description |
|---|---|---|
POST |
/api/indexes |
Create new index |
GET |
/api/indexes |
List indexes |
GET |
/api/indexes/{id} |
Get index |
POST |
/api/indexes/{id}/entries |
Create new entry into index |
GET |
/api/indexes/{id}/entries |
List entries in index |
POST |
/api/indexes/{id}/search |
Submit search for entries in index |
Method
POST
URI
/api/indexes
Query Parameters
None
Request Body
{
"name": "test",
"dimensions": 1536,
"similarity": "cosine",
"optimization": "none"
}Response Status
HTTP 200: OkHTTP 400: Error creating index due to client inputHTTP 500: Error creating index due to server error
Response Body
{
"id": 1,
"name": "test",
"dimensions": 1536,
"similarity": "cosine",
"optimization": "none"
}Note! The server will populate the id field, which is used to refer to the index in other API methods.
Method
GET
URI
/api/indexes
Query Parameters
None
Request Body
None
Response Status
HTTP 200: Ok
Response Body
[
{
"id": 1,
"name": "test",
"dimensions": 1536,
"similarity": "cosine",
"optimization": "none"
}
]Method
GET
URI
/api/indexes/{id}
Query Parameters
id: The index identifier
Request Body
None
Response Status
HTTP 200: OkHTTP 404: Index not found
Response Body
{
"id": 1,
"name": "test",
"dimensions": 1536,
"similarity": "cosine",
"optimization": "none",
"extras": {
"_max_vectors": 65536,
"_num_vectors": 1,
"_size_on_disk": 1310728
}
}Method
POST
URI
/api/indexes/{id}/entries
Query Parameters
id: The index identifier
Request Body
{
"id": 1,
"embedding": [
0.1,
0.2,
0.3
]
}Response Status
HTTP 200: OkHTTP 400: Error creating entry due to client inputHTTP 404: Index not foundHTTP 500: Error creating entry due to server error
Response Body
{
"id": 1,
"embedding": [
0.2672612,
0.5345224,
0.8017837
]
}Method
GET
URI
/api/indexes/{id}/entries
Query Parameters
id: The index identifieroffset: The position in the index where to start retrieving entrieslimit: Maximun number of entries to retrieve
Request Body
None
Response Status
HTTP 200: OkHTTP 400: Error listing entries due to client inputHTTP 404: Index not foundHTTP 500: Error listing entries due to server error
Response Body
[
{
"id": 1,
"embedding": [
0.2672612,
0.5345224,
0.8017837
]
},
{
"id": 2,
"embedding": [
0.37139064,
0.557086,
0.7427813
]
},
{
"id": 3,
"embedding": [
0.4242641,
0.56568545,
0.70710677
]
}
]Method
POST
URI
/api/indexes/{id}/search
Query Parameters
id: The index identifier
Request Body
{
"embedding": [
0.1,
0.2,
0.3
],
"top": 3
}Response Status
HTTP 200: OkHTTP 400: Error searching entries due to client inputHTTP 404: Index not foundHTTP 500: Error searching entries due to server error
Response Body
{
"matches": [
{
"id": 1,
"distance": 5.9604644775390625E-8
},
{
"id": 2,
"distance": 0.007416725158691406
},
{
"id": 3,
"distance": 0.017292380332946777
}
],
"duration": 0,
"scanned": 3,
"total": 3,
"similarity": "cosine"
}