Skip to content

Persistent semantic memory for Claude Code via Qdrant. Store, search, and retrieve context across sessions.

License

Notifications You must be signed in to change notification settings

Hidden-History/ai-memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

263 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🧠 AI-Memory

AI-Memory Banner

Version 2.0.8 Stars License Issues PRs Welcome GitHub Sync Jira Cloud Langfuse Qdrant Parzival


Cure AI Amnesia.

AI-Memory is a persistent context layer designed to give your agents institutional memory. By bridging LLMs with a high-performance vector database (Qdrant), this framework ensures your agents remember architectural decisions, project rules, and past interactions across every session.

Explore the Docs | Report a Bug | Request a Feature


πŸš€ Key Features

  • 🧠 Cross-Session Memory: Claude remembers your last session automatically β€” no re-explaining needed.
  • ⏳ Semantic Decay: Memories age naturally β€” recent patterns rank higher than stale ones.
  • πŸ›‘οΈ 3-Layer Security: PII and secrets caught before storage via regex + detect-secrets + SpaCy NER.
  • πŸ™ GitHub History Sync: PRs, issues, commits, CI results searchable by meaning.
  • πŸ”­ LLM Observability: Full pipeline tracing via Langfuse β€” every hook, span, and classification visible.
  • 🎯 Progressive Context Injection: Right memories, right time, within token budgets.

🧬 Bespoke Neural Memory

This isn't a database you configure. It's institutional memory that forms as you build.

Traditional knowledge bases require upfront schema design and manual curation. AI-Memory takes a different approach: let the LLM and human decide what matters, and capture it as it happens.

🎯 Error fixed? Captured. πŸ“ Architecture decision made? Stored. πŸ“ Convention established? Remembered.

Your agents don't just executeβ€”they learn.

Aspect Benefit
🎨 Bespoke Memory unique to YOUR project
⚑ JIT Creation Emerges from work, not config
πŸ’« Transient β†’ Persistent Sessions become knowledge
πŸͺΆ Token Efficient ~500 token focused memories
πŸš€ Lightweight Docker + Qdrant + Python

πŸ›‘οΈ Parzival: Technical PM & Quality Gatekeeper

Parzival is your AI project manager embedded in Claude Code. Describe what needs doing, and Parzival orchestrates the work with verified precision β€” reading your architecture, PRD, and standards before creating prompts, never after.

Core capabilities:

  • Agent team orchestration: /parzival-team builds 3-tier parallel team prompts (lead β†’ workers β†’ reviewers) with exact file paths, line numbers, acceptance criteria, and project-specific context β€” derived from your actual project files, not assumptions
  • Quality gate enforcement: Mandatory reviewβ†’fixβ†’review cycles that continue until zero issues are found. Parzival never accepts "looks good enough"
  • Verified instructions: Every recommendation is checked against project files first and rated with a confidence level (Verified/Informed/Inferred/Uncertain/Unknown), with source citations included
  • False positive catching: When review agents flag issues, Parzival verifies findings against actual source code before acting β€” preventing wasted cycles on non-issues
  • Decision support: Presents options with pros/cons, tradeoffs, source citations, and confidence levels, then waits for your approval before proceeding
  • Risk and blocker tracking: Identifies risks proactively with severity levels and escalation paths; surfaces critical issues immediately
  • Session continuity: Handoffs are dual-written to local oversight files and the Qdrant discussions collection, enabling automatic cross-session resume at every SessionStart
  • Sprint and task management: Tracks sprints, tasks, blockers, and decisions across sessions via structured oversight files (task-tracker.md, decisions-log.md, SESSION_WORK_INDEX.md)

How the workflow works:

  1. You describe the work to Parzival
  2. Parzival reads your architecture, PRD, and standards before making any recommendation
  3. Parzival builds a precise agent team prompt β€” or individual dev prompt β€” with exact file paths and acceptance criteria
  4. You run the agents; Parzival reviews the results
  5. Review→fix→review continues until zero issues are found, then you approve

The core principle: Parzival recommends. You decide. Parzival is the radar operator on the ship β€” you are the captain who steers. It monitors, navigates, and verifies. It never writes code, makes final decisions, or executes agents autonomously. A 5-layer constraint system prevents the behavioral drift that causes AI agents to forget their role over long conversations.

Parzival is optional β€” AI Memory's core features (semantic decay, GitHub sync, search skills, freshness detection) work independently without it. For teams managing complex projects across many sessions, Parzival is the orchestration layer that keeps everything on track.

See docs/PARZIVAL-SESSION-GUIDE.md for setup, commands, and the full skills reference.


πŸ† What No Other Tool Has

AI-Memory combines capabilities that exist nowhere else as a single integrated system:

Capability What It Does
Semantic Decay Scoring Memories age naturally via exponential decay β€” recent patterns rank higher than stale ones, automatically
Cross-Session Memory Qdrant vector search resurfaces exactly the right context at session start, without you asking
3-Layer Security Pipeline PII and secrets screened via regex + detect-secrets + SpaCy NER before any content is stored
GitHub History β†’ Semantic Search PRs, issues, commits, CI results, code blobs, diffs, reviews, and releases searchable by meaning
Freshness Detection Stale code-pattern memories flagged automatically by comparing stored patterns against current git state (3/10/25 commit thresholds)
Dual Embedding Routing Code uses jina-v2-base-code; prose uses jina-v2-base-en β€” 10-30% better retrieval accuracy
Progressive Context Injection Token-budget-aware 3-tier delivery: session bootstrap, per-turn injection, confidence-filtered retrieval

✨ V2.0 Memory System

  • πŸ—‚οΈ Four Specialized Collections: code-patterns (HOW), conventions (WHAT), discussions (WHY), jira-data (JIRA)
  • 🎯 30 Memory Types: Precise categorization for implementation, errors, decisions, Jira issues, GitHub data, agent memory, and more
  • ⚑ 6 Automatic Triggers: Smart context injection when you need it most
  • πŸ” Intent Detection: Automatically routes queries to the right collection
  • πŸ’¬ Conversation Memory: Turn-by-turn capture with post-compaction context continuity
  • πŸ” Cascading Search: Falls back across collections for comprehensive results
  • πŸ“Š Monitoring: Prometheus metrics + Grafana dashboards
  • πŸ›‘οΈ Graceful Degradation: Works even when services are temporarily unavailable
  • πŸ‘₯ Multi-Project Isolation: group_id filtering keeps projects separate

New here? Jump to Quick Start to get running in 5 minutes.


πŸ•°οΈ V2.0.6 β€” Temporal Memory

v2.0.6 adds the WHEN dimension β€” your memories now understand time, freshness, and relevance decay.

  • ⏳ Semantic Decay Scoring: Older memories naturally lose relevance via exponential decay with type-specific half-lives (code: 14d, discussions: 21d, conventions: 60d)
  • πŸ”„ GitHub History Sync: Ingest PRs, issues, commits, CI results, and code blobs from your GitHub repo into the memory system
  • πŸ›‘οΈ Security Scanning Pipeline: 3-layer PII and secrets detection (regex + detect-secrets + SpaCy NER) runs before any content is stored
  • 🎯 Progressive Context Injection: Smart 3-tier context delivery β€” session bootstrap, per-turn injection, and confidence-filtered retrieval
  • πŸ” Freshness Detection: Automatically identifies stale memories by comparing against current git state
  • πŸ” SOPS+age Encryption: Encrypt sensitive configuration with modern age encryption
  • 🧭 Dual Embedding Routing: Code content uses jina-v2-base-code, prose uses jina-v2-base-en for 10-30% better retrieval
  • πŸ€– Parzival Oversight Agent: Technical PM, quality gatekeeper, and agent team orchestrator with cross-session memory backed by Qdrant
  • 🧰 8 New Skills: /aim-purge, /aim-github-search, /aim-github-sync, /aim-pause-updates, /aim-refresh, /parzival-save-handoff, /parzival-save-insight, /aim-freshness-report

πŸ”­ V2.0.7 β€” LLM Observability

v2.0.7 adds Langfuse LLM observability (opt-in, entirely optional) β€” full pipeline tracing so you can see exactly what your memory system is doing.

  • πŸ”­ 9-Step Pipeline Tracing: Every hook span (1_capture through 9_classify) is traced and visible in Langfuse with latency, inputs, and outputs
  • πŸ“Š Session Grouping: Traces are grouped by Claude Code session ID, so you can follow an entire conversation's memory operations as a single thread
  • πŸ›‘οΈ Kill-Switch Control: LANGFUSE_ENABLED=true|false turns all tracing on/off with zero code changes β€” hooks remain <500ms even when tracing is active
  • πŸ’Ύ File-Based Buffer: Trace events are written to disk (~5ms overhead) and flushed to Langfuse asynchronously by a dedicated worker β€” no SDK dependency in hook scripts
  • 🏷️ Custom Model Registration: Ollama, OpenRouter, and other LLM providers registered as custom models for cost tracking
  • πŸ‘₯ Multi-Project Isolation: Each project's traces are tagged with project_id (from group_id), keeping observability data separated
  • πŸ“ˆ Grafana Integration: Langfuse-specific Grafana panels for buffer depth, flush latency, and trace throughput alongside existing memory metrics
  • 🐳 7 New Services: Langfuse Web UI, Worker, PostgreSQL, ClickHouse, Redis, MinIO, and Trace Flush Worker β€” all opt-in via --profile langfuse

See docs/LANGFUSE-INTEGRATION.md for setup and architecture guide.


πŸ”— Jira Cloud Integration

Bring your work context into semantic memory with built-in Jira Cloud support:

  • Semantic Search: Search Jira issues and comments by meaning, not just keywords
  • Full & Incremental Sync: Initial backfill or fast daily updates via JQL
  • ADF Conversion: Atlassian Document Format β†’ plain text for accurate embeddings
  • Rich Filtering: Search by project, issue type, status, priority, or author
  • Issue Lookup: Retrieve complete issue context (issue + all comments, chronologically)
  • Dedicated Collection: jira-data collection keeps Jira content separate from code memory
  • Tenant Isolation: group_id based on Jira instance hostname prevents cross-instance leakage
  • Two Skills: /aim-jira-sync for synchronization, /aim-jira-search for semantic search

See docs/JIRA-INTEGRATION.md for setup and usage guide.


πŸ™ GitHub Integration

Bring your repository history into semantic memory with built-in GitHub support:

  • Semantic Search: Search PRs, issues, commits, CI results, and code blobs by meaning, not keywords
  • 9 Content Types: github_pr, github_issue, github_commit, github_ci_result, github_code_blob, github_pr_diff, github_pr_review, github_issue_comment, github_release
  • Full & Incremental Sync: First run backfills full history; subsequent runs fetch only new or updated items
  • AST-Aware Code Chunking: Code blobs are split at AST boundaries (functions, classes), not arbitrary character offsets
  • Freshness Feedback Loop: Merged PRs automatically flag stale code-pattern memories for review
  • Adaptive Rate Limiting: Reads X-RateLimit-Remaining response headers and backs off automatically
  • Two Skills: /aim-github-sync for synchronization, /aim-github-search for semantic search

See docs/GITHUB-INTEGRATION.md for setup and usage guide.


πŸ”­ Langfuse LLM Observability (Optional)

Note: Langfuse is entirely optional. AI Memory works fully without it. Enable only if you want LLM pipeline tracing. See docs/LANGFUSE-INTEGRATION.md for setup.

Opt-in LLM observability powered by Langfuse β€” trace every memory operation from hook execution through classification:

  • 9-Step Pipeline Spans: 1_capture, 2_log, 3_detect, 4_scan, 5_chunk, 6_embed, 7_store, 8_enqueue, 9_classify β€” each emitted as a Langfuse span with timing and payload data
  • Session-Based Traces: Traces are grouped by Claude Code session ID via Langfuse sessions, so you can follow all memory operations for a single conversation
  • File Buffer Architecture: Hook scripts write JSON events to trace_buffer/ (~5ms per event). A dedicated trace-flush-worker container reads and batches events to Langfuse every 5 seconds
  • Kill-Switch: LANGFUSE_ENABLED=false disables all trace emission globally. Per-hook control via LANGFUSE_TRACE_HOOKS=false
  • Buffer Eviction: Oldest trace files are automatically evicted when the buffer exceeds LANGFUSE_TRACE_BUFFER_MAX_MB (default: 100 MB)
  • Custom Model Tracking: ollama/*, openrouter/*, and openrouter/*:free registered as custom models for provider-aware cost analysis

Quick Start:

# Run Langfuse setup (generates secrets, starts services, registers models)
./scripts/langfuse_setup.sh

# Enable tracing in your .env
echo "LANGFUSE_ENABLED=true" >> docker/.env

# Start all services (including Langfuse, since LANGFUSE_ENABLED=true)
./scripts/stack.sh start

# Open Langfuse UI
open http://localhost:23100

See docs/LANGFUSE-INTEGRATION.md for complete setup, architecture, and troubleshooting guide.


πŸ”¬ Knowledge Discovery

Best Practices Researcher

When you ask "how should I..." or "what's the best way to...", AI-Memory's best-practices-researcher activates:

  1. Search Local Knowledge - Checks the conventions collection first
  2. Web Research - Searches 2024-2026 sources if needed
  3. Save Findings - Stores to oversight/knowledge/best-practices/BP-XXX.md
  4. Database Storage - Adds to Qdrant for future retrieval
  5. Skill Evaluation - Determines if findings warrant a reusable skill

Skill Creator Agent

When research reveals a repeatable process, the skill-creator agent can generate a Claude Code skill:

User: "Research best practices for writing commit messages"
β†’ Best Practices Researcher finds patterns
β†’ Evaluates: "This is a repeatable process with clear steps"
β†’ User confirms: "Yes, create a skill"
β†’ Skill Creator generates .claude/skills/writing-commits/SKILL.md

The Result: Your AI agents continuously discover and codify knowledge into reusable skills.


πŸ—οΈ Architecture

V2.0 Memory System

Claude Code Session
    β”œβ”€β”€ SessionStart Hooks (resume|compact) β†’ Context injection on session resume and post-compaction
    β”œβ”€β”€ UserPromptSubmit Hooks β†’ Unified keyword trigger (decisions/best practices/session history)
    β”œβ”€β”€ PreToolUse Hooks β†’ Smart triggers (new file/first edit conventions)
    β”œβ”€β”€ PostToolUse Hooks β†’ Capture code patterns + error detection
    β”œβ”€β”€ PreCompact Hook β†’ Save conversation before compaction
    └── Stop Hook β†’ Capture agent responses

Python Core (src/memory/)
    β”œβ”€β”€ config.py         β†’ Environment configuration
    β”œβ”€β”€ storage.py        β†’ Qdrant CRUD operations
    β”œβ”€β”€ search.py         β†’ Semantic search + cascading
    β”œβ”€β”€ intent.py         β†’ Intent detection + routing
    β”œβ”€β”€ triggers.py       β†’ Automatic trigger configuration
    β”œβ”€β”€ embeddings.py     β†’ Jina AI embeddings β€” jina-v2-base-en (prose) + jina-v2-base-code (code)
    └── deduplication.py  β†’ Hash + similarity dedup

Docker Services
    β”œβ”€β”€ Qdrant (port 26350)
    β”œβ”€β”€ Embedding Service (port 28080)
    β”œβ”€β”€ Classifier Worker (LLM reclassification)
    β”œβ”€β”€ Streamlit Dashboard (port 28501)
    β”œβ”€β”€ Monitoring Stack (--profile monitoring)
    β”‚   β”œβ”€β”€ Prometheus (port 29090)
    β”‚   β”œβ”€β”€ Pushgateway (port 29091)
    β”‚   └── Grafana (port 23000)
    └── Langfuse Stack (--profile langfuse)
        β”œβ”€β”€ Langfuse Web UI (port 23100)
        β”œβ”€β”€ Langfuse Worker (port 23130)
        β”œβ”€β”€ PostgreSQL (port 25432)
        β”œβ”€β”€ ClickHouse (port 28123)
        β”œβ”€β”€ Redis (port 26379)
        β”œβ”€β”€ MinIO (port 29000)
        └── Trace Flush Worker

v2.0.6 additions: GitHub sync service ingests repository data (PRs, issues, commits, code blobs) into the discussions collection. A 3-layer security scanning pipeline (regex + detect-secrets + SpaCy NER) screens all content before storage. Semantic decay scoring applies time-weighted relevance to all search queries. The Parzival session agent stores cross-session memory in the discussions collection for project continuity.

v2.0.7 additions: Langfuse LLM observability stack provides full pipeline tracing. Hook scripts emit trace events to a file-based buffer (trace_buffer.py), and a dedicated flush worker (trace_flush_worker.py) batches events to Langfuse via the SDK. All 9 pipeline steps are instrumented as spans, grouped by session ID. The classifier worker emits 9_classify spans with provider, confidence, and reclassification outcome.

Collection Structure

Collection Purpose Example Types
code-patterns HOW things are built implementation, error_fix, refactor
conventions WHAT rules to follow rule, guideline, naming, structure
discussions WHY things were decided decision, session, preference, user_message, agent_response, blocker
jira-data External work items from Jira Cloud jira_issue, jira_comment

Note: The jira-data collection is conditional β€” it is only created when Jira sync is enabled (JIRA_SYNC_ENABLED=true).

Automatic Triggers

The memory system automatically retrieves relevant context:

  • Error Detection: When a command fails, retrieves past error fixes
  • New File Creation: Retrieves naming conventions and structure patterns
  • First Edit: Retrieves file-specific patterns on first modification
  • Decision Keywords: "Why did we..." triggers decision memory retrieval
  • Best Practices Keywords: "How should I..." triggers convention retrieval
  • Session History Keywords: "What have we done..." triggers session summary retrieval

Trigger Keyword Reference

The following keywords automatically activate memory retrieval when detected in your prompts:

Decision Keywords (20 patterns) β†’ Searches discussions for past decisions
Category Keywords
Decision recall why did we, why do we, what was decided, what did we decide
Memory recall remember when, remember the decision, remember what, remember how, do you remember, recall when, recall the, recall how
Session references last session, previous session, earlier we, before we, previously, last time we, what did we do, where did we leave off
Session History Keywords (16 patterns) β†’ Searches discussions for session summaries
Category Keywords
Project status what have we done, what did we work on, project status, where were we, what's the status
Continuation continue from, pick up where, continue where
Remaining work what's left to do, remaining work, what's next for, what's next on, what's next in the, next steps, todo, tasks remaining
Best Practices Keywords (27 patterns) β†’ Searches conventions for guidelines
Category Keywords
Standards best practice, best practices, coding standard, coding standards, convention, conventions for
Patterns what's the pattern, what is the pattern, naming convention, style guide
Guidance how should i, how do i, what's the right way, what is the right way
Research research the pattern, research best practice, look up, find out about, what do the docs say
Recommendations should i use, what's recommended, what is recommended, recommended approach, preferred approach, preferred way, industry standard, common pattern

Note: Keywords are case-insensitive. Only structured patterns trigger retrieval to avoid false positives on casual conversation.

LLM Memory Classifier

The optional LLM Classifier automatically reclassifies captured memories into more precise types:

  • Rule-based first: Fast pattern matching (free, <10ms)
  • LLM fallback: AI classification when rules don't match
  • Provider chain: Primary provider with automatic fallback

Quick Setup:

# Configure in docker/.env
MEMORY_CLASSIFIER_ENABLED=true
MEMORY_CLASSIFIER_PRIMARY_PROVIDER=ollama    # or: openrouter, claude, openai
MEMORY_CLASSIFIER_FALLBACK_PROVIDERS=openrouter

# For Ollama (free, local)
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_MODEL=sam860/LFM2:2.6b

# For OpenRouter (free tier available)
OPENROUTER_API_KEY=sk-or-v1-your-key
OPENROUTER_MODEL=google/gemma-2-9b-it:free

See docs/llm-classifier.md for complete setup guide, provider options, and troubleshooting.

πŸš€ Quick Start

1. Start Services

# Recommended: use the unified stack manager
./scripts/stack.sh start    # Start all services (reads .env for profile selection)
./scripts/stack.sh status   # Check health
./scripts/stack.sh stop     # Graceful shutdown

Alternatively, using Docker Compose directly:

# Core services (Qdrant + Embedding)
docker compose -f docker/docker-compose.yml up -d

# With monitoring (adds Prometheus, Grafana, Pushgateway)
docker compose -f docker/docker-compose.yml --profile monitoring up -d

2. Verify Services

# Check Qdrant (port 26350)
curl -H "api-key: $QDRANT_API_KEY" http://localhost:26350/health

# Check Embedding Service (port 28080)
curl http://localhost:28080/health

# Check Grafana (port 23000) - if monitoring enabled
open http://localhost:23000  # credentials from installation

3. Install to Project

./scripts/install.sh /path/to/your-project

# With convention seeding (recommended)
SEED_BEST_PRACTICES=true ./scripts/install.sh /path/to/your-project

Expected Output:

═══════════════════════════════════════════════════════════
  AI Memory Module Health Check
═══════════════════════════════════════════════════════════

[1/3] Checking Qdrant (localhost:26350)...
  βœ… Qdrant is healthy

[2/3] Checking Embedding Service (localhost:28080)...
  βœ… Embedding service is healthy

[3/3] Checking Monitoring API (localhost:28000)...
  βœ… Monitoring API is healthy

═══════════════════════════════════════════════════════════
  All Services Healthy βœ…
═══════════════════════════════════════════════════════════

πŸ“¦ Installation

Prerequisites

  • Python 3.10+ (3.11+ required for AsyncSDKWrapper)
  • Docker 20.10+ (for Qdrant + embedding service)
  • Claude Code (target project where memory will be installed)

Resource Requirements

AI Memory runs on 16 GiB RAM (4 cores minimum). Adding the optional Langfuse LLM observability module increases the requirement to 32 GiB RAM (8 cores recommended).

Tier Services Minimum RAM Recommended CPU
Core (default) 8 services 16 GiB 4 cores
Core + Langfuse (opt-in) 15 services 32 GiB 8 cores

Installation Steps

See INSTALL.md for detailed installation instructions including:

  • System requirements with version compatibility
  • Step-by-step installation for macOS, Linux, and Windows (WSL2)
  • Automated installer and manual installation methods
  • Post-installation verification
  • Configuration options
  • Uninstallation procedures

βš™οΈ Configuration

πŸ”Œ Service Ports

All services use 2XXXX prefix to avoid conflicts:

Service External Internal Access URL
Qdrant 26350 6333 localhost:26350
Embedding 28080 8080 localhost:28080/embed
Monitoring API 28000 8000 localhost:28000/health
Streamlit 28501 8501 localhost:28501
Grafana 23000 3000 localhost:23000
Prometheus 29090 9090 localhost:29090 (--profile monitoring)
Pushgateway 29091 9091 localhost:29091 (--profile monitoring)

Optional: Langfuse LLM Observability Ports (opt-in):

Port Service Notes
23100 Langfuse Web UI Optional (Langfuse)
23130 Langfuse Worker Optional (Langfuse)
25432 Langfuse PostgreSQL Optional (Langfuse)
26379 Langfuse Redis Optional (Langfuse)
28123 Langfuse ClickHouse Optional (Langfuse)
29000 Langfuse MinIO Optional (Langfuse)

Environment Variables

Variable Default Description
QDRANT_HOST localhost Qdrant server hostname
QDRANT_PORT 26350 Qdrant external port
EMBEDDING_HOST localhost Embedding service hostname
EMBEDDING_PORT 28080 Embedding service port
AI_MEMORY_INSTALL_DIR ~/.ai-memory Installation directory
MEMORY_LOG_LEVEL INFO Logging level (DEBUG/INFO/WARNING)

Jira Cloud Integration (Optional):

Variable Default Description
JIRA_INSTANCE_URL (empty) Jira Cloud URL (e.g., https://company.atlassian.net)
JIRA_EMAIL (empty) Jira account email for Basic Auth
JIRA_API_TOKEN (empty) API token from id.atlassian.com
JIRA_PROJECTS (empty) JSON array of project keys (e.g., ["PROJ","DEV","OPS"]). Comma-separated also accepted for backwards compatibility.
JIRA_SYNC_ENABLED false Enable Jira synchronization
JIRA_SYNC_DELAY_MS 100 Delay between API requests (ms)

See docs/JIRA-INTEGRATION.md for complete Jira setup guide.

Override Example:

export QDRANT_PORT=16333  # Use custom port
export MEMORY_LOG_LEVEL=DEBUG  # Enable verbose logging

πŸ’‘ Usage

πŸ”§ Automatic Memory Capture

Memory capture happens automatically via Claude Code hooks:

  1. SessionStart (resume/compact only): Injects relevant memories when resuming a session or after context compaction
  2. PostToolUse: Captures code patterns (Write/Edit/NotebookEdit tools) in background (<500ms)
  3. PreCompact: Saves session summary before context compaction (auto or manual /compact)
  4. Stop: Optional per-response cleanup

No manual intervention required - hooks handle everything.

The "Aha Moment": Claude remembers your previous sessions automatically. Start a new session and Claude will say "Welcome back! Last session we worked on..." without you reminding it.

🎯 Manual Memory Operations

Use slash commands for manual control:

# Check system status
/aim-status

# Manually save current session
/aim-save

# Search across all memories
/aim-search <query>

# Jira Cloud Integration (requires JIRA_SYNC_ENABLED=true)
/aim-jira-sync              # Incremental sync from Jira
/aim-jira-sync --full       # Full sync (all issues and comments)
/aim-jira-search "query"    # Semantic search across Jira content
/aim-jira-search --issue PROJ-42  # Lookup issue + all comments

v2.0.6+ Skills

Command Description
/aim-purge Purge old memories with dry-run safety (e.g., --older-than 90d)
/aim-github-search Semantic search of GitHub data (PRs, issues, commits, code)
/aim-github-sync Manually trigger GitHub repository sync
/aim-pause-updates Toggle automatic memory updates on/off (kill switch)
/aim-refresh Trigger freshness scan on changed files
/parzival-save-handoff Save Parzival session handoff to Qdrant memory
/parzival-save-insight Save a Parzival insight for cross-session recall
/aim-freshness-report Scan code-patterns for stale memories by comparing against current git state

Upgraded Skills (v2.0.6+)

Command What Changed
/aim-status 4 new sections: decay stats, GitHub sync status, security scan summary, Parzival session info
/aim-search Now displays decay scores alongside relevance scores
/aim-save Supports agent memory types (handoff, insight, task)

See docs/HOOKS.md for hook documentation, docs/COMMANDS.md for commands, docs/llm-classifier.md for LLM classifier setup, docs/JIRA-INTEGRATION.md for Jira integration guide, and docs/LANGFUSE-INTEGRATION.md for LLM observability setup.

πŸ€– AsyncSDKWrapper (Agent SDK Integration)

The AsyncSDKWrapper provides full async/await support for building custom Agent SDK agents with persistent memory.

Features:

  • Full async/await support compatible with Agent SDK
  • Rate limiting with token bucket algorithm (Tier 1: 50 RPM, 30K TPM)
  • Exponential backoff retry with jitter (3 retries: 1s, 2s, 4s Β±20%)
  • Automatic conversation capture to discussions collection
  • Background storage (fire-and-forget pattern)
  • Prometheus metrics integration

Basic Usage:

import asyncio
from src.memory import AsyncSDKWrapper

async def main():
    async with AsyncSDKWrapper(cwd="/path/to/project") as wrapper:
        # Send message with automatic rate limiting and retry
        result = await wrapper.send_message(
            prompt="What is async/await?",
            model="claude-sonnet-4-5-20250929",
            max_tokens=500
        )

        print(f"Response: {result['content']}")
        print(f"Session ID: {result['session_id']}")

asyncio.run(main())

Streaming Responses (Buffered):

Note: Current implementation buffers the full response for retry reliability. True progressive streaming planned for future release.

async with AsyncSDKWrapper(cwd="/path/to/project") as wrapper:
    async for chunk in wrapper.send_message_buffered(
        prompt="Explain Python async",
        model="claude-sonnet-4-5-20250929",
        max_tokens=800
    ):
        print(chunk, end='', flush=True)

Custom Rate Limits:

async with AsyncSDKWrapper(
    cwd="/path/to/project",
    requests_per_minute=100,   # Tier 2
    tokens_per_minute=100000   # Tier 2
) as wrapper:
    result = await wrapper.send_message("Hello!")

Examples:

  • examples/async_sdk_basic.py - Basic async/await usage, context manager pattern, session ID logging, rate limiting demonstration
  • examples/async_sdk_streaming.py - Streaming response handling (buffered), progressive chunk processing, retry behavior
  • examples/async_sdk_rate_limiting.py - Custom rate limit configuration, queue depth/timeout settings, error handling for different API tiers

Configuration:

Set ANTHROPIC_API_KEY environment variable before using AsyncSDKWrapper:

export ANTHROPIC_API_KEY=sk-ant-api03-...

Rate Limiting:

The wrapper implements token bucket algorithm matching Anthropic's rate limits:

Tier Requests/Min Tokens/Min
Free 5 10,000
Tier 1 50 (default) 30,000 (default)
Tier 2 100 100,000
Tier 3+ 1,000+ 400,000+

Circuit breaker protections:

  • Max queue depth: 100 requests
  • Queue timeout: 60 seconds
  • Raises QueueTimeoutError or QueueDepthExceededError if exceeded

Retry Strategy:

Automatic exponential backoff retry (DEC-029):

  • Max retries: 3
  • Delays: 1s, 2s, 4s (with Β±20% jitter)
  • Retries on: 429 (rate limit), 529 (overload), network errors
  • No retry on: 4xx client errors (except 429), auth failures
  • Respects retry-after header when provided

Memory Capture:

All messages are automatically captured to the discussions collection:

  • User messages β†’ user_message type
  • Agent responses β†’ agent_response type
  • Background storage (non-blocking)
  • Session-based grouping with turn numbers

See src/memory/async_sdk_wrapper.py for complete API documentation.

For complete design rationale, see oversight/specs/tech-debt-035/phase-2-design.md.

πŸ‘₯ Multi-Project Support

Memories are automatically isolated by group_id (derived from project directory):

# Project A: group_id = "project-a"
# Project B: group_id = "project-b"
# Searches only return memories from current project

V2.0 Collection Isolation:

  • code-patterns: Implementation patterns (per-project isolation)
  • conventions: Coding standards and rules (shared across projects by default)
  • discussions: Decisions, sessions, conversations (per-project isolation)

πŸ”§ Troubleshooting

Common Issues

See TROUBLESHOOTING.md for comprehensive troubleshooting, including:

  • Services won't start
  • Health check failures
  • Memories not captured
  • Search not working
  • Performance problems
  • Data persistence issues

Recovery Script

If hooks are misbehaving (e.g., after a failed install or upgrade), use the recovery script to scan and repair all project configurations:

# Dry-run: shows what would change (safe, no modifications)
python scripts/recover_hook_guards.py

# Apply fixes across all discovered projects
python scripts/recover_hook_guards.py --apply

# Scan only: list all discovered project settings.json files
python scripts/recover_hook_guards.py --scan

The recovery script automatically discovers projects via:

  1. ~/.ai-memory/installed_projects.json manifest (primary)
  2. Sibling directories of AI_MEMORY_INSTALL_DIR (fallback)
  3. Common project paths (additional fallback)

It fixes: unguarded hook commands (BUG-066), broad SessionStart matchers (BUG-078), and other known configuration issues. Always run with dry-run first to review changes.

Quick Diagnostic Commands

# Check all services
docker compose -f docker/docker-compose.yml ps

# Check logs
docker compose -f docker/docker-compose.yml logs

# Check health
python scripts/health-check.py

# Check ports
lsof -i :26350  # Qdrant
lsof -i :28080  # Embedding
lsof -i :28000  # Monitoring API

Services Won't Start

Symptom: docker compose up -d fails or services exit immediately

Solution:

  1. Check port availability:

    lsof -i :26350  # Qdrant
    lsof -i :28080  # Embedding
  2. Check Docker logs:

    docker compose -f docker/docker-compose.yml logs
  3. Ensure Docker daemon is running:

    docker ps  # Should not error

Health Check Fails

Symptom: python scripts/health-check.py shows unhealthy services

Solution:

  1. Check service status:

    docker compose -f docker/docker-compose.yml ps
  2. Verify ports are accessible:

    curl -H "api-key: $QDRANT_API_KEY" http://localhost:26350/health  # Qdrant
    curl http://localhost:28080/health  # Embedding
  3. Check logs for errors:

    docker compose -f docker/docker-compose.yml logs qdrant
    docker compose -f docker/docker-compose.yml logs embedding

Memories Not Captured

Symptom: PostToolUse hook doesn't store memories

Solution:

  1. Check hook configuration in .claude/settings.json:

    {
      "hooks": {
        "PostToolUse": [{
          "matcher": "Write|Edit",
          "hooks": [{"type": "command", "command": ".claude/hooks/scripts/post_tool_capture.py"}]
        }]
      }
    }
  2. Verify hook script is executable:

    ls -la .claude/hooks/scripts/post_tool_capture.py
    chmod +x .claude/hooks/scripts/post_tool_capture.py
  3. Check hook logs (if logging enabled):

    cat ~/.ai-memory/logs/hook.log

For more detailed troubleshooting, see TROUBLESHOOTING.md.

πŸ“Š Monitoring

Grafana Dashboards (V3)

Access Grafana at http://localhost:23000 (credentials set during installation):

Dashboard Purpose
NFR Performance Overview All 6 NFR metrics with SLO compliance
Hook Activity Hook execution rates, latency heatmaps
Memory Operations Captures, retrievals, deduplication
System Health Service status, error rates

Performance Targets (NFRs)

NFR Metric Target
NFR-P1 Hook execution <500ms
NFR-P2 Batch embedding <2s
NFR-P3 Session injection <3s
NFR-P4 Dedup check <100ms
NFR-P5 Retrieval query <500ms
NFR-P6 Real-time embedding <500ms

Service Ports

Service Port
Grafana 23000
Prometheus 29090
Pushgateway 29091

Key Metrics

All metrics use aimemory_ prefix (BP-045 compliant):

Metric Description
aimemory_hook_duration_seconds Hook execution time (NFR-P1)
aimemory_captures_total Total memory capture attempts
aimemory_retrievals_total Total retrieval operations
aimemory_trigger_fires_total Automatic trigger activations

See docs/MONITORING.md for complete monitoring guide and docs/prometheus-queries.md for query examples.

πŸ’Ύ Backup & Restore

Protect your AI memories with built-in backup and restore scripts.

Quick Backup

# Setup (one-time)
cd /path/to/ai-memory
python3 -m venv .venv && source .venv/bin/activate
pip install httpx

# Get your Qdrant API key
cat ~/.ai-memory/docker/.env | grep QDRANT_API_KEY
export QDRANT_API_KEY="your-key-here"

# Run backup
python scripts/backup_qdrant.py

Backups are stored in backups/ directory with timestamped folders containing:

  • Collection snapshots (discussions, conventions, code-patterns)
  • Configuration files
  • Verification manifest

Quick Restore

python scripts/restore_qdrant.py backups/2026-02-03_143052

See docs/BACKUP-RESTORE.md for complete instructions including troubleshooting.

Coming soon: Backup and restore scripts will be updated in the next version to support the jira-data collection, including Jira database backup and reinstall.

πŸ“ˆ Performance

⚑ Benchmarks

  • Hook overhead: <500ms (PostToolUse forks to background)
  • Embedding generation: <2s (pre-warmed Docker service)
  • SessionStart context injection: <3s
  • Deduplication check: <100ms

Optimization Tips

  1. Enable monitoring profile for production use:
    docker compose -f docker/docker-compose.yml --profile monitoring up -d

πŸ› οΈ Development

πŸ§ͺ Running Tests

# Run all tests
pytest tests/

# Run specific test file
pytest tests/test_storage.py -v

# Run integration tests only
pytest tests/integration/ -v

Project Structure

ai-memory/
β”œβ”€β”€ src/memory/          # Core Python modules
β”œβ”€β”€ .claude/
β”‚   β”œβ”€β”€ hooks/scripts/   # Hook implementations
β”‚   └── skills/          # Skill definitions
β”œβ”€β”€ docker/              # Docker Compose and service configs
β”œβ”€β”€ scripts/             # Installation and management scripts
β”œβ”€β”€ tests/               # pytest test suite
└── docs/                # Additional documentation

Coding Conventions

  • Python (PEP 8 Strict): Files snake_case.py, Functions snake_case(), Classes PascalCase, Constants UPPER_SNAKE
  • Qdrant Payload Fields: Always snake_case (content_hash, group_id, source_hook)
  • Structured Logging: Use logger.info("event", extra={"key": "value"}), never f-strings
  • Hook Exit Codes: 0 (success), 1 (non-blocking error), 2 (blocking error - rare)
  • Graceful Degradation: All components must fail silently - Claude works without memory

See CONTRIBUTING.md for complete development setup and coding standards.

🀝 Contributing

We welcome contributions! To contribute:

  1. Fork the repository and create a feature branch
  2. Follow coding conventions (see Development section above)
  3. Write tests for all new functionality
  4. Ensure all tests pass: pytest tests/
  5. Update documentation if adding features
  6. Submit a pull request with a clear description

See CONTRIBUTING.md for detailed development setup and pull request process.

πŸ“„ License

MIT License - see LICENSE for details.


Accessibility

This documentation follows WCAG 2.2 Level AA accessibility standards (ISO/IEC 40500:2025):

  • βœ… Proper heading hierarchy (h1 β†’ h2 β†’ h3)
  • βœ… Descriptive link text (no "click here")
  • βœ… Code blocks with language identifiers
  • βœ… Tables with headers for screen readers
  • βœ… Consistent bullet style (hyphens)
  • βœ… ASCII art diagrams for universal compatibility

For accessibility concerns or suggestions, please open an issue.


Documentation Best Practices Applied (2026):

This README follows current best practices for technical documentation:

About

Persistent semantic memory for Claude Code via Qdrant. Store, search, and retrieve context across sessions.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors