This guide covers configuring generic Local LLM runners (Ollama, LM Studio) to work as backends for tinyMem.
tinyMem (Proxy) -> Local LLM Runner
Ollama provides an OpenAI-compatible API by default.
- Run Ollama:
ollama serve - Pull a model:
ollama pull llama3
[proxy]
port = 8080
base_url = "http://localhost:11434/v1" # Ollama's default OpenAI endpoint
[llm]
model = "llama3" # Must match 'ollama list'LM Studio is a GUI for running local models.
- Load a model in LM Studio.
- Go to the Local Server tab (double-arrow icon).
- Start Server. Default port is
1234.
[proxy]
port = 8080
base_url = "http://localhost:1234/v1"
[llm]
# LM Studio often uses "local-model" or the exact filename.
# Check the "Model Identifier" field in the server tab.
model = "llama-3-8b-instruct"If running llama-server directly:
./llama-server -m models/7B/ggml-model-q4_0.gguf -c 2048 --port 8000[proxy]
port = 8080
base_url = "http://localhost:8000/v1"
[llm]
model = "default" # Llama.cpp server usually ignores model name if only one is loadedFor full configuration options, see Configuration.md.
- "Connection refused": Ensure your local runner (Ollama/LM Studio) is actually running and listening on the expected port.
- Context Limit Errors: Local models often have smaller context windows. Decrease
max_itemsin[recall]if you hit limits.