An intelligent knowledge management system for Linux kernel development that automatically indexes your documentation and provides context-aware code assistance through GitHub Copilot.
- π Active Knowledge Journaling: Copilot automatically maintains journal files as you share knowledge
- π Automatic Indexing: File monitoring service detects changes and updates vector database
- π€ RAG-Powered Retrieval: Semantic search using sentence transformers
- π MCP Server Integration: Seamlessly integrates with GitHub Copilot via Model Context Protocol
- β‘ Real-time Updates: Changes are indexed within seconds
- ποΈ Hierarchical Organization: Mirrors kernel source tree structure for context-aware suggestions
- ποΈ Vector Database: ChromaDB for efficient similarity search
As you work with GitHub Copilot and share kernel knowledge during your coding sessions, Copilot will:
- Listen for knowledge sharing - When you explain concepts, gotchas, or implementation details
- Extract key information - File paths, function names, explanations, common issues
- Update journal files - Automatically creates or updates the appropriate markdown file in the correct subsystem directory
- Auto-index - Changes are detected and indexed within seconds for immediate searchability
You just code and share knowledge naturally - Copilot handles the documentation!
Of course, you can also manually edit any journal file anytime.
.kernel-knowledge/
βββ journals/ # Markdown knowledge files (mirrors kernel structure)
β βββ fs/smb/client/ # SMB client knowledge
β βββ fs/smb/server/ # Kernel SMB server knowledge
β βββ net/ # Networking subsystem
β βββ mm/ # Memory management
β βββ kernel/ # Core kernel
β βββ drivers/ # Device drivers
βββ vector-db/ # ChromaDB database (auto-generated)
βββ mcp-server/ # MCP server and indexer service
β βββ indexer_service.py
β βββ server.py
βββ venv/ # Python virtual environment
βββ requirements.txt # Python dependencies
βββ setup.sh # Installation script
βββ start-indexer.sh # Start the indexing service
βββ stop-indexer.sh # Stop the indexing service
βββ test-mcp-server.sh # Test the MCP server
Run the setup script to install all dependencies:
cd .kernel-knowledge
chmod +x setup.sh
./setup.shThis will:
- Create a Python virtual environment
- Install required packages (ChromaDB, sentence-transformers, MCP, etc.)
- Download the embedding model
- Set up the systemd user service
- Make all scripts executable
The indexer will automatically start when you open the workspace in VS Code (via the workspace task).
You can also start it manually:
./start-indexer.shThe indexer service will:
- Run in the background as a systemd user service
- Recursively monitor the
journals/directory for changes - Automatically index new or modified markdown files
- Update the vector database in real-time
Add the MCP server to your VS Code settings:
- Open VS Code Settings (JSON)
- Add the following configuration:
{
"github.copilot.chat.mcp.servers": {
"kernel-knowledge": {
"command": "python3",
"args": [
"/home/sprasad/repo/sprasad-microsoft/smb3-kernel-client/.kernel-knowledge/mcp-server/server.py"
]
}
}
}- Restart VS Code or reload the window
Create markdown files in the appropriate subsystem directory under journals/:
cd journals/fs/smb/client
vim oplock-handling.mdExample journal entry:
# SMB3 Oplock Handling
## Overview
Oplocks (opportunistic locks) allow SMB clients to cache file data...
## Implementation
The oplock logic is implemented in fs/smb/client/smb2ops.c...
## Common Issues
- Oplock break delays
- Client cache invalidationThe file will be automatically indexed within seconds!
Once configured, GitHub Copilot can access your kernel knowledge automatically:
-
Search for specific topics:
@kernel-knowledge What do we know about SMB3 directory leasing? -
Get context for code changes:
I need to modify the VFS caching layer. What context do we have? -
List available knowledge:
@kernel-knowledge What topics have been documented?
The MCP server provides these tools to Copilot:
search_kernel_knowledge: Semantic search across all journalssearch_by_subsystem: Search within a specific kernel subsystem path (e.g., "fs/smb/client")list_knowledge_files: List all indexed files with their pathsget_knowledge_stats: Get statistics about the knowledge baseread_knowledge_file: Read complete content of a specific file by path
systemctl --user status kernel-knowledge-indexer.servicejournalctl --user -u kernel-knowledge-indexer.service -f./stop-indexer.sh./test-mcp-server.sh- Follow Kernel Structure: Place files in the directory matching the kernel subsystem
- Use Clear Headings: Organize content with markdown headings
- Be Specific: Include file paths, function names, and line numbers
- Add Context: Explain why, not just what
- Link Concepts: Reference related subsystems and files
- Include Examples: Add code snippets and command outputs
Match your journal structure to the kernel source:
- fs/smb/client/: SMB client implementation knowledge
- fs/smb/server/: Kernel SMB server (ksmbd) knowledge
- net/: Networking subsystem topics
- mm/: Memory management topics
- kernel/: Core kernel functionality
- drivers/: Device driver knowledge
- Protocol implementations (SMB3, NFS, CIFS)
- Kernel subsystems (VFS, memory management, networking)
- Debugging techniques and root cause analyses
- Performance optimization strategies
- Bug fixes and their explanations
- Testing procedures and gotchas
- Build system quirks
- Hardware-specific issues
# Topic Title
## Overview
Brief introduction to the topic
## Background
Historical context, why it exists
## Implementation
Where in the code it lives, key functions
## Related Files
- fs/smb/client/file1.c - Description
- fs/smb/client/file2.h - Description
## Common Issues
Known problems and solutions
## Testing
How to test this functionality
## Related Topics
Links to other relevant concepts
## References
Commits, patches, documentation links- File Monitoring: Watchdog monitors
journals/directory - Change Detection: Detects create, modify, delete events
- Text Extraction: Parses markdown into semantic chunks
- Embedding Generation: Creates vector embeddings using
all-MiniLM-L6-v2 - Database Update: Stores in ChromaDB with metadata
- Query Processing: User query β vector embedding
- Similarity Search: ChromaDB finds most relevant chunks
- Result Ranking: Sorts by cosine similarity
- Context Delivery: Returns formatted results to Copilot
- Protocol Handler: Implements Model Context Protocol
- Tool Endpoints: Exposes search and retrieval tools
- Resource Provider: Makes knowledge available as resources
- Async Communication: Efficient stdio-based communication
# Check logs
journalctl --user -u kernel-knowledge-indexer.service -n 50
# Check if port is in use
ps aux | grep indexer_service# Test directly
cd .kernel-knowledge
source venv/bin/activate
python3 mcp-server/server.py# Check if files are indexed
./test-mcp-server.sh
# Manually trigger re-indexing
systemctl --user restart kernel-knowledge-indexer.service- Verify the path in VS Code settings
- Check that Python path is correct
- Ensure virtual environment is activated
- Reload VS Code window
- Embedding Model: Uses lightweight
all-MiniLM-L6-v2(80MB) - Database: ChromaDB with SQLite backend
- Indexing Speed: ~100-200 chunks/second
- Search Latency: <100ms for most queries
- Memory Usage: ~200-500MB depending on knowledge base size
- All data is stored locally in your workspace
- No external API calls for embeddings
- No telemetry or data collection
- ChromaDB runs entirely offline
Edit mcp-server/indexer_service.py and mcp-server/server.py:
EMBEDDING_MODEL = "your-model-name"Options:
all-MiniLM-L6-v2(default, fast, 384 dimensions)all-mpnet-base-v2(better quality, slower, 768 dimensions)multi-qa-MiniLM-L6-cos-v1(optimized for Q&A)
Modify the chunking logic in extract_text_chunks() to change how documents are split.
Update COLLECTION_NAME in both service files to use a different database collection.
See journals/README.md for examples and templates.
This is a personal knowledge base system. Customize it to fit your workflow!
MIT License - Feel free to modify and extend
- ChromaDB: Vector database
- Sentence Transformers: Embedding generation
- Model Context Protocol: Copilot integration
- Watchdog: File system monitoring
Happy Journaling! ππ
For questions or issues, check the logs or test the system with ./test-mcp-server.sh