AI-Memory is a persistent context layer designed to give your agents institutional memory. By bridging LLMs with a high-performance vector database (Qdrant), this framework ensures your agents remember architectural decisions, project rules, and past interactions across every session.
Explore the Docs | Report a Bug | Request a Feature
- π§ Cross-Session Memory: Claude remembers your last session automatically β no re-explaining needed.
- β³ Semantic Decay: Memories age naturally β recent patterns rank higher than stale ones.
- π‘οΈ 3-Layer Security: PII and secrets caught before storage via regex + detect-secrets + SpaCy NER.
- π GitHub History Sync: PRs, issues, commits, CI results searchable by meaning.
- π LLM Observability: Full pipeline tracing via Langfuse β every hook, span, and classification visible.
- π― Progressive Context Injection: Right memories, right time, within token budgets.
|
This isn't a database you configure. It's institutional memory that forms as you build. Traditional knowledge bases require upfront schema design and manual curation. AI-Memory takes a different approach: let the LLM and human decide what matters, and capture it as it happens.
Your agents don't just executeβthey learn. |
|
Parzival is your AI project manager embedded in Claude Code. Describe what needs doing, and Parzival orchestrates the work with verified precision β reading your architecture, PRD, and standards before creating prompts, never after.
Core capabilities:
- Agent team orchestration:
/parzival-teambuilds 3-tier parallel team prompts (lead β workers β reviewers) with exact file paths, line numbers, acceptance criteria, and project-specific context β derived from your actual project files, not assumptions - Quality gate enforcement: Mandatory reviewβfixβreview cycles that continue until zero issues are found. Parzival never accepts "looks good enough"
- Verified instructions: Every recommendation is checked against project files first and rated with a confidence level (Verified/Informed/Inferred/Uncertain/Unknown), with source citations included
- False positive catching: When review agents flag issues, Parzival verifies findings against actual source code before acting β preventing wasted cycles on non-issues
- Decision support: Presents options with pros/cons, tradeoffs, source citations, and confidence levels, then waits for your approval before proceeding
- Risk and blocker tracking: Identifies risks proactively with severity levels and escalation paths; surfaces critical issues immediately
- Session continuity: Handoffs are dual-written to local oversight files and the Qdrant
discussionscollection, enabling automatic cross-session resume at everySessionStart - Sprint and task management: Tracks sprints, tasks, blockers, and decisions across sessions via structured oversight files (
task-tracker.md,decisions-log.md,SESSION_WORK_INDEX.md)
How the workflow works:
- You describe the work to Parzival
- Parzival reads your architecture, PRD, and standards before making any recommendation
- Parzival builds a precise agent team prompt β or individual dev prompt β with exact file paths and acceptance criteria
- You run the agents; Parzival reviews the results
- Reviewβfixβreview continues until zero issues are found, then you approve
The core principle: Parzival recommends. You decide. Parzival is the radar operator on the ship β you are the captain who steers. It monitors, navigates, and verifies. It never writes code, makes final decisions, or executes agents autonomously. A 5-layer constraint system prevents the behavioral drift that causes AI agents to forget their role over long conversations.
Parzival is optional β AI Memory's core features (semantic decay, GitHub sync, search skills, freshness detection) work independently without it. For teams managing complex projects across many sessions, Parzival is the orchestration layer that keeps everything on track.
See docs/PARZIVAL-SESSION-GUIDE.md for setup, commands, and the full skills reference.
AI-Memory combines capabilities that exist nowhere else as a single integrated system:
| Capability | What It Does |
|---|---|
| Semantic Decay Scoring | Memories age naturally via exponential decay β recent patterns rank higher than stale ones, automatically |
| Cross-Session Memory | Qdrant vector search resurfaces exactly the right context at session start, without you asking |
| 3-Layer Security Pipeline | PII and secrets screened via regex + detect-secrets + SpaCy NER before any content is stored |
| GitHub History β Semantic Search | PRs, issues, commits, CI results, code blobs, diffs, reviews, and releases searchable by meaning |
| Freshness Detection | Stale code-pattern memories flagged automatically by comparing stored patterns against current git state (3/10/25 commit thresholds) |
| Dual Embedding Routing | Code uses jina-v2-base-code; prose uses jina-v2-base-en β 10-30% better retrieval accuracy |
| Progressive Context Injection | Token-budget-aware 3-tier delivery: session bootstrap, per-turn injection, confidence-filtered retrieval |
- ποΈ Four Specialized Collections: code-patterns (HOW), conventions (WHAT), discussions (WHY), jira-data (JIRA)
- π― 30 Memory Types: Precise categorization for implementation, errors, decisions, Jira issues, GitHub data, agent memory, and more
- β‘ 6 Automatic Triggers: Smart context injection when you need it most
- π Intent Detection: Automatically routes queries to the right collection
- π¬ Conversation Memory: Turn-by-turn capture with post-compaction context continuity
- π Cascading Search: Falls back across collections for comprehensive results
- π Monitoring: Prometheus metrics + Grafana dashboards
- π‘οΈ Graceful Degradation: Works even when services are temporarily unavailable
- π₯ Multi-Project Isolation:
group_idfiltering keeps projects separate
New here? Jump to Quick Start to get running in 5 minutes.
v2.0.6 adds the WHEN dimension β your memories now understand time, freshness, and relevance decay.
- β³ Semantic Decay Scoring: Older memories naturally lose relevance via exponential decay with type-specific half-lives (code: 14d, discussions: 21d, conventions: 60d)
- π GitHub History Sync: Ingest PRs, issues, commits, CI results, and code blobs from your GitHub repo into the memory system
- π‘οΈ Security Scanning Pipeline: 3-layer PII and secrets detection (regex + detect-secrets + SpaCy NER) runs before any content is stored
- π― Progressive Context Injection: Smart 3-tier context delivery β session bootstrap, per-turn injection, and confidence-filtered retrieval
- π Freshness Detection: Automatically identifies stale memories by comparing against current git state
- π SOPS+age Encryption: Encrypt sensitive configuration with modern age encryption
- π§ Dual Embedding Routing: Code content uses
jina-v2-base-code, prose usesjina-v2-base-enfor 10-30% better retrieval - π€ Parzival Oversight Agent: Technical PM, quality gatekeeper, and agent team orchestrator with cross-session memory backed by Qdrant
- π§° 8 New Skills:
/aim-purge,/aim-github-search,/aim-github-sync,/aim-pause-updates,/aim-refresh,/parzival-save-handoff,/parzival-save-insight,/aim-freshness-report
v2.0.7 adds Langfuse LLM observability (opt-in, entirely optional) β full pipeline tracing so you can see exactly what your memory system is doing.
- π 9-Step Pipeline Tracing: Every hook span (
1_capturethrough9_classify) is traced and visible in Langfuse with latency, inputs, and outputs - π Session Grouping: Traces are grouped by Claude Code session ID, so you can follow an entire conversation's memory operations as a single thread
- π‘οΈ Kill-Switch Control:
LANGFUSE_ENABLED=true|falseturns all tracing on/off with zero code changes β hooks remain <500ms even when tracing is active - πΎ File-Based Buffer: Trace events are written to disk (~5ms overhead) and flushed to Langfuse asynchronously by a dedicated worker β no SDK dependency in hook scripts
- π·οΈ Custom Model Registration: Ollama, OpenRouter, and other LLM providers registered as custom models for cost tracking
- π₯ Multi-Project Isolation: Each project's traces are tagged with
project_id(fromgroup_id), keeping observability data separated - π Grafana Integration: Langfuse-specific Grafana panels for buffer depth, flush latency, and trace throughput alongside existing memory metrics
- π³ 7 New Services: Langfuse Web UI, Worker, PostgreSQL, ClickHouse, Redis, MinIO, and Trace Flush Worker β all opt-in via
--profile langfuse
See docs/LANGFUSE-INTEGRATION.md for setup and architecture guide.
Bring your work context into semantic memory with built-in Jira Cloud support:
- Semantic Search: Search Jira issues and comments by meaning, not just keywords
- Full & Incremental Sync: Initial backfill or fast daily updates via JQL
- ADF Conversion: Atlassian Document Format β plain text for accurate embeddings
- Rich Filtering: Search by project, issue type, status, priority, or author
- Issue Lookup: Retrieve complete issue context (issue + all comments, chronologically)
- Dedicated Collection:
jira-datacollection keeps Jira content separate from code memory - Tenant Isolation:
group_idbased on Jira instance hostname prevents cross-instance leakage - Two Skills:
/aim-jira-syncfor synchronization,/aim-jira-searchfor semantic search
See docs/JIRA-INTEGRATION.md for setup and usage guide.
Bring your repository history into semantic memory with built-in GitHub support:
- Semantic Search: Search PRs, issues, commits, CI results, and code blobs by meaning, not keywords
- 9 Content Types:
github_pr,github_issue,github_commit,github_ci_result,github_code_blob,github_pr_diff,github_pr_review,github_issue_comment,github_release - Full & Incremental Sync: First run backfills full history; subsequent runs fetch only new or updated items
- AST-Aware Code Chunking: Code blobs are split at AST boundaries (functions, classes), not arbitrary character offsets
- Freshness Feedback Loop: Merged PRs automatically flag stale code-pattern memories for review
- Adaptive Rate Limiting: Reads
X-RateLimit-Remainingresponse headers and backs off automatically - Two Skills:
/aim-github-syncfor synchronization,/aim-github-searchfor semantic search
See docs/GITHUB-INTEGRATION.md for setup and usage guide.
Note: Langfuse is entirely optional. AI Memory works fully without it. Enable only if you want LLM pipeline tracing. See docs/LANGFUSE-INTEGRATION.md for setup.
Opt-in LLM observability powered by Langfuse β trace every memory operation from hook execution through classification:
- 9-Step Pipeline Spans:
1_capture,2_log,3_detect,4_scan,5_chunk,6_embed,7_store,8_enqueue,9_classifyβ each emitted as a Langfuse span with timing and payload data - Session-Based Traces: Traces are grouped by Claude Code session ID via Langfuse sessions, so you can follow all memory operations for a single conversation
- File Buffer Architecture: Hook scripts write JSON events to
trace_buffer/(~5ms per event). A dedicatedtrace-flush-workercontainer reads and batches events to Langfuse every 5 seconds - Kill-Switch:
LANGFUSE_ENABLED=falsedisables all trace emission globally. Per-hook control viaLANGFUSE_TRACE_HOOKS=false - Buffer Eviction: Oldest trace files are automatically evicted when the buffer exceeds
LANGFUSE_TRACE_BUFFER_MAX_MB(default: 100 MB) - Custom Model Tracking:
ollama/*,openrouter/*, andopenrouter/*:freeregistered as custom models for provider-aware cost analysis
Quick Start:
# Run Langfuse setup (generates secrets, starts services, registers models)
./scripts/langfuse_setup.sh
# Enable tracing in your .env
echo "LANGFUSE_ENABLED=true" >> docker/.env
# Start all services (including Langfuse, since LANGFUSE_ENABLED=true)
./scripts/stack.sh start
# Open Langfuse UI
open http://localhost:23100See docs/LANGFUSE-INTEGRATION.md for complete setup, architecture, and troubleshooting guide.
When you ask "how should I..." or "what's the best way to...", AI-Memory's best-practices-researcher activates:
- Search Local Knowledge - Checks the conventions collection first
- Web Research - Searches 2024-2026 sources if needed
- Save Findings - Stores to
oversight/knowledge/best-practices/BP-XXX.md - Database Storage - Adds to Qdrant for future retrieval
- Skill Evaluation - Determines if findings warrant a reusable skill
When research reveals a repeatable process, the skill-creator agent can generate a Claude Code skill:
User: "Research best practices for writing commit messages"
β Best Practices Researcher finds patterns
β Evaluates: "This is a repeatable process with clear steps"
β User confirms: "Yes, create a skill"
β Skill Creator generates .claude/skills/writing-commits/SKILL.md
The Result: Your AI agents continuously discover and codify knowledge into reusable skills.
Claude Code Session
βββ SessionStart Hooks (resume|compact) β Context injection on session resume and post-compaction
βββ UserPromptSubmit Hooks β Unified keyword trigger (decisions/best practices/session history)
βββ PreToolUse Hooks β Smart triggers (new file/first edit conventions)
βββ PostToolUse Hooks β Capture code patterns + error detection
βββ PreCompact Hook β Save conversation before compaction
βββ Stop Hook β Capture agent responses
Python Core (src/memory/)
βββ config.py β Environment configuration
βββ storage.py β Qdrant CRUD operations
βββ search.py β Semantic search + cascading
βββ intent.py β Intent detection + routing
βββ triggers.py β Automatic trigger configuration
βββ embeddings.py β Jina AI embeddings β jina-v2-base-en (prose) + jina-v2-base-code (code)
βββ deduplication.py β Hash + similarity dedup
Docker Services
βββ Qdrant (port 26350)
βββ Embedding Service (port 28080)
βββ Classifier Worker (LLM reclassification)
βββ Streamlit Dashboard (port 28501)
βββ Monitoring Stack (--profile monitoring)
β βββ Prometheus (port 29090)
β βββ Pushgateway (port 29091)
β βββ Grafana (port 23000)
βββ Langfuse Stack (--profile langfuse)
βββ Langfuse Web UI (port 23100)
βββ Langfuse Worker (port 23130)
βββ PostgreSQL (port 25432)
βββ ClickHouse (port 28123)
βββ Redis (port 26379)
βββ MinIO (port 29000)
βββ Trace Flush Worker
v2.0.6 additions: GitHub sync service ingests repository data (PRs, issues, commits, code blobs) into the discussions collection. A 3-layer security scanning pipeline (regex + detect-secrets + SpaCy NER) screens all content before storage. Semantic decay scoring applies time-weighted relevance to all search queries. The Parzival session agent stores cross-session memory in the discussions collection for project continuity.
v2.0.7 additions: Langfuse LLM observability stack provides full pipeline tracing. Hook scripts emit trace events to a file-based buffer (trace_buffer.py), and a dedicated flush worker (trace_flush_worker.py) batches events to Langfuse via the SDK. All 9 pipeline steps are instrumented as spans, grouped by session ID. The classifier worker emits 9_classify spans with provider, confidence, and reclassification outcome.
| Collection | Purpose | Example Types |
|---|---|---|
| code-patterns | HOW things are built | implementation, error_fix, refactor |
| conventions | WHAT rules to follow | rule, guideline, naming, structure |
| discussions | WHY things were decided | decision, session, preference, user_message, agent_response, blocker |
| jira-data | External work items from Jira Cloud | jira_issue, jira_comment |
Note: The
jira-datacollection is conditional β it is only created when Jira sync is enabled (JIRA_SYNC_ENABLED=true).
The memory system automatically retrieves relevant context:
- Error Detection: When a command fails, retrieves past error fixes
- New File Creation: Retrieves naming conventions and structure patterns
- First Edit: Retrieves file-specific patterns on first modification
- Decision Keywords: "Why did we..." triggers decision memory retrieval
- Best Practices Keywords: "How should I..." triggers convention retrieval
- Session History Keywords: "What have we done..." triggers session summary retrieval
The following keywords automatically activate memory retrieval when detected in your prompts:
Decision Keywords (20 patterns) β Searches discussions for past decisions
| Category | Keywords |
|---|---|
| Decision recall | why did we, why do we, what was decided, what did we decide |
| Memory recall | remember when, remember the decision, remember what, remember how, do you remember, recall when, recall the, recall how |
| Session references | last session, previous session, earlier we, before we, previously, last time we, what did we do, where did we leave off |
Session History Keywords (16 patterns) β Searches discussions for session summaries
| Category | Keywords |
|---|---|
| Project status | what have we done, what did we work on, project status, where were we, what's the status |
| Continuation | continue from, pick up where, continue where |
| Remaining work | what's left to do, remaining work, what's next for, what's next on, what's next in the, next steps, todo, tasks remaining |
Best Practices Keywords (27 patterns) β Searches conventions for guidelines
| Category | Keywords |
|---|---|
| Standards | best practice, best practices, coding standard, coding standards, convention, conventions for |
| Patterns | what's the pattern, what is the pattern, naming convention, style guide |
| Guidance | how should i, how do i, what's the right way, what is the right way |
| Research | research the pattern, research best practice, look up, find out about, what do the docs say |
| Recommendations | should i use, what's recommended, what is recommended, recommended approach, preferred approach, preferred way, industry standard, common pattern |
Note: Keywords are case-insensitive. Only structured patterns trigger retrieval to avoid false positives on casual conversation.
The optional LLM Classifier automatically reclassifies captured memories into more precise types:
- Rule-based first: Fast pattern matching (free, <10ms)
- LLM fallback: AI classification when rules don't match
- Provider chain: Primary provider with automatic fallback
Quick Setup:
# Configure in docker/.env
MEMORY_CLASSIFIER_ENABLED=true
MEMORY_CLASSIFIER_PRIMARY_PROVIDER=ollama # or: openrouter, claude, openai
MEMORY_CLASSIFIER_FALLBACK_PROVIDERS=openrouter
# For Ollama (free, local)
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=sam860/LFM2:2.6b
# For OpenRouter (free tier available)
OPENROUTER_API_KEY=sk-or-v1-your-key
OPENROUTER_MODEL=google/gemma-2-9b-it:freeSee docs/llm-classifier.md for complete setup guide, provider options, and troubleshooting.
# Recommended: use the unified stack manager
./scripts/stack.sh start # Start all services (reads .env for profile selection)
./scripts/stack.sh status # Check health
./scripts/stack.sh stop # Graceful shutdownAlternatively, using Docker Compose directly:
# Core services (Qdrant + Embedding)
docker compose -f docker/docker-compose.yml up -d
# With monitoring (adds Prometheus, Grafana, Pushgateway)
docker compose -f docker/docker-compose.yml --profile monitoring up -d# Check Qdrant (port 26350)
curl -H "api-key: $QDRANT_API_KEY" http://localhost:26350/health
# Check Embedding Service (port 28080)
curl http://localhost:28080/health
# Check Grafana (port 23000) - if monitoring enabled
open http://localhost:23000 # credentials from installation./scripts/install.sh /path/to/your-project
# With convention seeding (recommended)
SEED_BEST_PRACTICES=true ./scripts/install.sh /path/to/your-projectExpected Output:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AI Memory Module Health Check
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[1/3] Checking Qdrant (localhost:26350)...
β
Qdrant is healthy
[2/3] Checking Embedding Service (localhost:28080)...
β
Embedding service is healthy
[3/3] Checking Monitoring API (localhost:28000)...
β
Monitoring API is healthy
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
All Services Healthy β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Python 3.10+ (3.11+ required for AsyncSDKWrapper)
- Docker 20.10+ (for Qdrant + embedding service)
- Claude Code (target project where memory will be installed)
AI Memory runs on 16 GiB RAM (4 cores minimum). Adding the optional Langfuse LLM observability module increases the requirement to 32 GiB RAM (8 cores recommended).
| Tier | Services | Minimum RAM | Recommended CPU |
|---|---|---|---|
| Core (default) | 8 services | 16 GiB | 4 cores |
| Core + Langfuse (opt-in) | 15 services | 32 GiB | 8 cores |
See INSTALL.md for detailed installation instructions including:
- System requirements with version compatibility
- Step-by-step installation for macOS, Linux, and Windows (WSL2)
- Automated installer and manual installation methods
- Post-installation verification
- Configuration options
- Uninstallation procedures
All services use 2XXXX prefix to avoid conflicts:
| Service | External | Internal | Access URL |
|---|---|---|---|
| Qdrant | 26350 | 6333 | localhost:26350 |
| Embedding | 28080 | 8080 | localhost:28080/embed |
| Monitoring API | 28000 | 8000 | localhost:28000/health |
| Streamlit | 28501 | 8501 | localhost:28501 |
| Grafana | 23000 | 3000 | localhost:23000 |
| Prometheus | 29090 | 9090 | localhost:29090 (--profile monitoring) |
| Pushgateway | 29091 | 9091 | localhost:29091 (--profile monitoring) |
Optional: Langfuse LLM Observability Ports (opt-in):
| Port | Service | Notes |
|---|---|---|
| 23100 | Langfuse Web UI | Optional (Langfuse) |
| 23130 | Langfuse Worker | Optional (Langfuse) |
| 25432 | Langfuse PostgreSQL | Optional (Langfuse) |
| 26379 | Langfuse Redis | Optional (Langfuse) |
| 28123 | Langfuse ClickHouse | Optional (Langfuse) |
| 29000 | Langfuse MinIO | Optional (Langfuse) |
| Variable | Default | Description |
|---|---|---|
QDRANT_HOST |
localhost |
Qdrant server hostname |
QDRANT_PORT |
26350 |
Qdrant external port |
EMBEDDING_HOST |
localhost |
Embedding service hostname |
EMBEDDING_PORT |
28080 |
Embedding service port |
AI_MEMORY_INSTALL_DIR |
~/.ai-memory |
Installation directory |
MEMORY_LOG_LEVEL |
INFO |
Logging level (DEBUG/INFO/WARNING) |
Jira Cloud Integration (Optional):
| Variable | Default | Description |
|---|---|---|
JIRA_INSTANCE_URL |
(empty) | Jira Cloud URL (e.g., https://company.atlassian.net) |
JIRA_EMAIL |
(empty) | Jira account email for Basic Auth |
JIRA_API_TOKEN |
(empty) | API token from id.atlassian.com |
JIRA_PROJECTS |
(empty) | JSON array of project keys (e.g., ["PROJ","DEV","OPS"]). Comma-separated also accepted for backwards compatibility. |
JIRA_SYNC_ENABLED |
false |
Enable Jira synchronization |
JIRA_SYNC_DELAY_MS |
100 |
Delay between API requests (ms) |
See docs/JIRA-INTEGRATION.md for complete Jira setup guide.
Override Example:
export QDRANT_PORT=16333 # Use custom port
export MEMORY_LOG_LEVEL=DEBUG # Enable verbose loggingMemory capture happens automatically via Claude Code hooks:
- SessionStart (resume/compact only): Injects relevant memories when resuming a session or after context compaction
- PostToolUse: Captures code patterns (Write/Edit/NotebookEdit tools) in background (<500ms)
- PreCompact: Saves session summary before context compaction (auto or manual
/compact) - Stop: Optional per-response cleanup
No manual intervention required - hooks handle everything.
The "Aha Moment": Claude remembers your previous sessions automatically. Start a new session and Claude will say "Welcome back! Last session we worked on..." without you reminding it.
Use slash commands for manual control:
# Check system status
/aim-status
# Manually save current session
/aim-save
# Search across all memories
/aim-search <query>
# Jira Cloud Integration (requires JIRA_SYNC_ENABLED=true)
/aim-jira-sync # Incremental sync from Jira
/aim-jira-sync --full # Full sync (all issues and comments)
/aim-jira-search "query" # Semantic search across Jira content
/aim-jira-search --issue PROJ-42 # Lookup issue + all comments| Command | Description |
|---|---|
/aim-purge |
Purge old memories with dry-run safety (e.g., --older-than 90d) |
/aim-github-search |
Semantic search of GitHub data (PRs, issues, commits, code) |
/aim-github-sync |
Manually trigger GitHub repository sync |
/aim-pause-updates |
Toggle automatic memory updates on/off (kill switch) |
/aim-refresh |
Trigger freshness scan on changed files |
/parzival-save-handoff |
Save Parzival session handoff to Qdrant memory |
/parzival-save-insight |
Save a Parzival insight for cross-session recall |
/aim-freshness-report |
Scan code-patterns for stale memories by comparing against current git state |
| Command | What Changed |
|---|---|
/aim-status |
4 new sections: decay stats, GitHub sync status, security scan summary, Parzival session info |
/aim-search |
Now displays decay scores alongside relevance scores |
/aim-save |
Supports agent memory types (handoff, insight, task) |
See docs/HOOKS.md for hook documentation, docs/COMMANDS.md for commands, docs/llm-classifier.md for LLM classifier setup, docs/JIRA-INTEGRATION.md for Jira integration guide, and docs/LANGFUSE-INTEGRATION.md for LLM observability setup.
The AsyncSDKWrapper provides full async/await support for building custom Agent SDK agents with persistent memory.
Features:
- Full async/await support compatible with Agent SDK
- Rate limiting with token bucket algorithm (Tier 1: 50 RPM, 30K TPM)
- Exponential backoff retry with jitter (3 retries: 1s, 2s, 4s Β±20%)
- Automatic conversation capture to discussions collection
- Background storage (fire-and-forget pattern)
- Prometheus metrics integration
Basic Usage:
import asyncio
from src.memory import AsyncSDKWrapper
async def main():
async with AsyncSDKWrapper(cwd="/path/to/project") as wrapper:
# Send message with automatic rate limiting and retry
result = await wrapper.send_message(
prompt="What is async/await?",
model="claude-sonnet-4-5-20250929",
max_tokens=500
)
print(f"Response: {result['content']}")
print(f"Session ID: {result['session_id']}")
asyncio.run(main())Streaming Responses (Buffered):
Note: Current implementation buffers the full response for retry reliability. True progressive streaming planned for future release.
async with AsyncSDKWrapper(cwd="/path/to/project") as wrapper:
async for chunk in wrapper.send_message_buffered(
prompt="Explain Python async",
model="claude-sonnet-4-5-20250929",
max_tokens=800
):
print(chunk, end='', flush=True)Custom Rate Limits:
async with AsyncSDKWrapper(
cwd="/path/to/project",
requests_per_minute=100, # Tier 2
tokens_per_minute=100000 # Tier 2
) as wrapper:
result = await wrapper.send_message("Hello!")Examples:
examples/async_sdk_basic.py- Basic async/await usage, context manager pattern, session ID logging, rate limiting demonstrationexamples/async_sdk_streaming.py- Streaming response handling (buffered), progressive chunk processing, retry behaviorexamples/async_sdk_rate_limiting.py- Custom rate limit configuration, queue depth/timeout settings, error handling for different API tiers
Configuration:
Set ANTHROPIC_API_KEY environment variable before using AsyncSDKWrapper:
export ANTHROPIC_API_KEY=sk-ant-api03-...Rate Limiting:
The wrapper implements token bucket algorithm matching Anthropic's rate limits:
| Tier | Requests/Min | Tokens/Min |
|---|---|---|
| Free | 5 | 10,000 |
| Tier 1 | 50 (default) | 30,000 (default) |
| Tier 2 | 100 | 100,000 |
| Tier 3+ | 1,000+ | 400,000+ |
Circuit breaker protections:
- Max queue depth: 100 requests
- Queue timeout: 60 seconds
- Raises
QueueTimeoutErrororQueueDepthExceededErrorif exceeded
Retry Strategy:
Automatic exponential backoff retry (DEC-029):
- Max retries: 3
- Delays: 1s, 2s, 4s (with Β±20% jitter)
- Retries on: 429 (rate limit), 529 (overload), network errors
- No retry on: 4xx client errors (except 429), auth failures
- Respects
retry-afterheader when provided
Memory Capture:
All messages are automatically captured to the discussions collection:
- User messages β
user_messagetype - Agent responses β
agent_responsetype - Background storage (non-blocking)
- Session-based grouping with turn numbers
See src/memory/async_sdk_wrapper.py for complete API documentation.
For complete design rationale, see oversight/specs/tech-debt-035/phase-2-design.md.
Memories are automatically isolated by group_id (derived from project directory):
# Project A: group_id = "project-a"
# Project B: group_id = "project-b"
# Searches only return memories from current projectV2.0 Collection Isolation:
- code-patterns: Implementation patterns (per-project isolation)
- conventions: Coding standards and rules (shared across projects by default)
- discussions: Decisions, sessions, conversations (per-project isolation)
See TROUBLESHOOTING.md for comprehensive troubleshooting, including:
- Services won't start
- Health check failures
- Memories not captured
- Search not working
- Performance problems
- Data persistence issues
If hooks are misbehaving (e.g., after a failed install or upgrade), use the recovery script to scan and repair all project configurations:
# Dry-run: shows what would change (safe, no modifications)
python scripts/recover_hook_guards.py
# Apply fixes across all discovered projects
python scripts/recover_hook_guards.py --apply
# Scan only: list all discovered project settings.json files
python scripts/recover_hook_guards.py --scanThe recovery script automatically discovers projects via:
~/.ai-memory/installed_projects.jsonmanifest (primary)- Sibling directories of
AI_MEMORY_INSTALL_DIR(fallback) - Common project paths (additional fallback)
It fixes: unguarded hook commands (BUG-066), broad SessionStart matchers (BUG-078), and other known configuration issues. Always run with dry-run first to review changes.
# Check all services
docker compose -f docker/docker-compose.yml ps
# Check logs
docker compose -f docker/docker-compose.yml logs
# Check health
python scripts/health-check.py
# Check ports
lsof -i :26350 # Qdrant
lsof -i :28080 # Embedding
lsof -i :28000 # Monitoring APISymptom: docker compose up -d fails or services exit immediately
Solution:
-
Check port availability:
lsof -i :26350 # Qdrant lsof -i :28080 # Embedding
-
Check Docker logs:
docker compose -f docker/docker-compose.yml logs
-
Ensure Docker daemon is running:
docker ps # Should not error
Symptom: python scripts/health-check.py shows unhealthy services
Solution:
-
Check service status:
docker compose -f docker/docker-compose.yml ps
-
Verify ports are accessible:
curl -H "api-key: $QDRANT_API_KEY" http://localhost:26350/health # Qdrant curl http://localhost:28080/health # Embedding
-
Check logs for errors:
docker compose -f docker/docker-compose.yml logs qdrant docker compose -f docker/docker-compose.yml logs embedding
Symptom: PostToolUse hook doesn't store memories
Solution:
-
Check hook configuration in
.claude/settings.json:{ "hooks": { "PostToolUse": [{ "matcher": "Write|Edit", "hooks": [{"type": "command", "command": ".claude/hooks/scripts/post_tool_capture.py"}] }] } } -
Verify hook script is executable:
ls -la .claude/hooks/scripts/post_tool_capture.py chmod +x .claude/hooks/scripts/post_tool_capture.py
-
Check hook logs (if logging enabled):
cat ~/.ai-memory/logs/hook.log
For more detailed troubleshooting, see TROUBLESHOOTING.md.
Access Grafana at http://localhost:23000 (credentials set during installation):
| Dashboard | Purpose |
|---|---|
| NFR Performance Overview | All 6 NFR metrics with SLO compliance |
| Hook Activity | Hook execution rates, latency heatmaps |
| Memory Operations | Captures, retrievals, deduplication |
| System Health | Service status, error rates |
| NFR | Metric | Target |
|---|---|---|
| NFR-P1 | Hook execution | <500ms |
| NFR-P2 | Batch embedding | <2s |
| NFR-P3 | Session injection | <3s |
| NFR-P4 | Dedup check | <100ms |
| NFR-P5 | Retrieval query | <500ms |
| NFR-P6 | Real-time embedding | <500ms |
| Service | Port |
|---|---|
| Grafana | 23000 |
| Prometheus | 29090 |
| Pushgateway | 29091 |
All metrics use aimemory_ prefix (BP-045 compliant):
| Metric | Description |
|---|---|
aimemory_hook_duration_seconds |
Hook execution time (NFR-P1) |
aimemory_captures_total |
Total memory capture attempts |
aimemory_retrievals_total |
Total retrieval operations |
aimemory_trigger_fires_total |
Automatic trigger activations |
See docs/MONITORING.md for complete monitoring guide and docs/prometheus-queries.md for query examples.
Protect your AI memories with built-in backup and restore scripts.
# Setup (one-time)
cd /path/to/ai-memory
python3 -m venv .venv && source .venv/bin/activate
pip install httpx
# Get your Qdrant API key
cat ~/.ai-memory/docker/.env | grep QDRANT_API_KEY
export QDRANT_API_KEY="your-key-here"
# Run backup
python scripts/backup_qdrant.pyBackups are stored in backups/ directory with timestamped folders containing:
- Collection snapshots (discussions, conventions, code-patterns)
- Configuration files
- Verification manifest
python scripts/restore_qdrant.py backups/2026-02-03_143052See docs/BACKUP-RESTORE.md for complete instructions including troubleshooting.
Coming soon: Backup and restore scripts will be updated in the next version to support the
jira-datacollection, including Jira database backup and reinstall.
- Hook overhead: <500ms (PostToolUse forks to background)
- Embedding generation: <2s (pre-warmed Docker service)
- SessionStart context injection: <3s
- Deduplication check: <100ms
- Enable monitoring profile for production use:
docker compose -f docker/docker-compose.yml --profile monitoring up -d
# Run all tests
pytest tests/
# Run specific test file
pytest tests/test_storage.py -v
# Run integration tests only
pytest tests/integration/ -vai-memory/
βββ src/memory/ # Core Python modules
βββ .claude/
β βββ hooks/scripts/ # Hook implementations
β βββ skills/ # Skill definitions
βββ docker/ # Docker Compose and service configs
βββ scripts/ # Installation and management scripts
βββ tests/ # pytest test suite
βββ docs/ # Additional documentation
- Python (PEP 8 Strict): Files
snake_case.py, Functionssnake_case(), ClassesPascalCase, ConstantsUPPER_SNAKE - Qdrant Payload Fields: Always
snake_case(content_hash,group_id,source_hook) - Structured Logging: Use
logger.info("event", extra={"key": "value"}), never f-strings - Hook Exit Codes:
0(success),1(non-blocking error),2(blocking error - rare) - Graceful Degradation: All components must fail silently - Claude works without memory
See CONTRIBUTING.md for complete development setup and coding standards.
We welcome contributions! To contribute:
- Fork the repository and create a feature branch
- Follow coding conventions (see Development section above)
- Write tests for all new functionality
- Ensure all tests pass:
pytest tests/ - Update documentation if adding features
- Submit a pull request with a clear description
See CONTRIBUTING.md for detailed development setup and pull request process.
MIT License - see LICENSE for details.
This documentation follows WCAG 2.2 Level AA accessibility standards (ISO/IEC 40500:2025):
- β Proper heading hierarchy (h1 β h2 β h3)
- β Descriptive link text (no "click here")
- β Code blocks with language identifiers
- β Tables with headers for screen readers
- β Consistent bullet style (hyphens)
- β ASCII art diagrams for universal compatibility
For accessibility concerns or suggestions, please open an issue.
Documentation Best Practices Applied (2026):
This README follows current best practices for technical documentation:
- Documentation as Code (Technical Documentation Best Practices)
- Markdown standards with consistent formatting (Markdown Best Practices)
- Essential sections per README standards (Make a README)
- Quick value communication (README Best Practices - Tilburg Science Hub)
- WCAG 2.2 accessibility compliance (W3C WCAG 2.2 as ISO Standard)
