Kernel Knowledge RAG System

An intelligent knowledge management system for Linux kernel development that automatically indexes your documentation and provides context-aware code assistance through GitHub Copilot.

🎯 Features

📝 Active Knowledge Journaling: Copilot automatically maintains journal files as you share knowledge
🔍 Automatic Indexing: File monitoring service detects changes and updates vector database
🤖 RAG-Powered Retrieval: Semantic search using sentence transformers
🔌 MCP Server Integration: Seamlessly integrates with GitHub Copilot via Model Context Protocol
⚡ Real-time Updates: Changes are indexed within seconds
🗂️ Hierarchical Organization: Mirrors kernel source tree structure for context-aware suggestions
🗄️ Vector Database: ChromaDB for efficient similarity search

🤖 How It Works

As you work with GitHub Copilot and share kernel knowledge during your coding sessions, Copilot will:

Listen for knowledge sharing - When you explain concepts, gotchas, or implementation details
Extract key information - File paths, function names, explanations, common issues
Update journal files - Automatically creates or updates the appropriate markdown file in the correct subsystem directory
Auto-index - Changes are detected and indexed within seconds for immediate searchability

You just code and share knowledge naturally - Copilot handles the documentation!

Of course, you can also manually edit any journal file anytime.

📁 Directory Structure

.kernel-knowledge/
├── journals/              # Markdown knowledge files (mirrors kernel structure)
│   ├── fs/smb/client/    # SMB client knowledge
│   ├── fs/smb/server/    # Kernel SMB server knowledge
│   ├── net/              # Networking subsystem
│   ├── mm/               # Memory management
│   ├── kernel/           # Core kernel
│   └── drivers/          # Device drivers
├── vector-db/            # ChromaDB database (auto-generated)
├── mcp-server/           # MCP server and indexer service
│   ├── indexer_service.py
│   └── server.py
├── venv/                 # Python virtual environment
├── requirements.txt      # Python dependencies
├── setup.sh             # Installation script
├── start-indexer.sh     # Start the indexing service
├── stop-indexer.sh      # Stop the indexing service
└── test-mcp-server.sh   # Test the MCP server

🚀 Quick Start

1. Installation

Run the setup script to install all dependencies:

cd .kernel-knowledge
chmod +x setup.sh
./setup.sh

This will:

Create a Python virtual environment
Install required packages (ChromaDB, sentence-transformers, MCP, etc.)
Download the embedding model
Set up the systemd user service
Make all scripts executable

2. Start the Indexer Service

The indexer will automatically start when you open the workspace in VS Code (via the workspace task).

You can also start it manually:

./start-indexer.sh

The indexer service will:

Run in the background as a systemd user service
Recursively monitor the journals/ directory for changes
Automatically index new or modified markdown files
Update the vector database in real-time

3. Configure GitHub Copilot

Add the MCP server to your VS Code settings:

Open VS Code Settings (JSON)
Add the following configuration:

{
  "github.copilot.chat.mcp.servers": {
    "kernel-knowledge": {
      "command": "python3",
      "args": [
        "/home/sprasad/repo/sprasad-microsoft/smb3-kernel-client/.kernel-knowledge/mcp-server/server.py"
      ]
    }
  }
}

Restart VS Code or reload the window

4. Start Journaling

Create markdown files in the appropriate subsystem directory under journals/:

cd journals/fs/smb/client
vim oplock-handling.md

Example journal entry:

# SMB3 Oplock Handling

## Overview

Oplocks (opportunistic locks) allow SMB clients to cache file data...

## Implementation

The oplock logic is implemented in fs/smb/client/smb2ops.c...

## Common Issues

- Oplock break delays
- Client cache invalidation

The file will be automatically indexed within seconds!

💡 Using the Knowledge Base

Once configured, GitHub Copilot can access your kernel knowledge automatically:

Example Queries in Copilot Chat:

Search for specific topics:

@kernel-knowledge What do we know about SMB3 directory leasing?

Get context for code changes:

I need to modify the VFS caching layer. What context do we have?

List available knowledge:

@kernel-knowledge What topics have been documented?

Available MCP Tools:

The MCP server provides these tools to Copilot:

search_kernel_knowledge: Semantic search across all journals
search_by_subsystem: Search within a specific kernel subsystem path (e.g., "fs/smb/client")
list_knowledge_files: List all indexed files with their paths
get_knowledge_stats: Get statistics about the knowledge base
read_knowledge_file: Read complete content of a specific file by path

🔧 Management Commands

Check Service Status

systemctl --user status kernel-knowledge-indexer.service

View Logs

journalctl --user -u kernel-knowledge-indexer.service -f

Stop the Service

./stop-indexer.sh

Test the MCP Server

./test-mcp-server.sh

📚 Writing Effective Journal Entries

Best Practices:

Follow Kernel Structure: Place files in the directory matching the kernel subsystem
Use Clear Headings: Organize content with markdown headings
Be Specific: Include file paths, function names, and line numbers
Add Context: Explain why, not just what
Link Concepts: Reference related subsystems and files
Include Examples: Add code snippets and command outputs

Directory Organization:

Match your journal structure to the kernel source:

fs/smb/client/: SMB client implementation knowledge
fs/smb/server/: Kernel SMB server (ksmbd) knowledge
net/: Networking subsystem topics
mm/: Memory management topics
kernel/: Core kernel functionality
drivers/: Device driver knowledge

Example Structure:

# Topic Title

## Overview
Brief introduction to the topic

## Background
Historical context, why it exists

## Implementation
Where in the code it lives, key functions

## Related Files
- fs/smb/client/file1.c - Description
- fs/smb/client/file2.h - Description

## Common Issues
Known problems and solutions

## Testing
How to test this functionality

## Related Topics
Links to other relevant concepts

## References
Commits, patches, documentation links

🔍 How It Works

Indexing Pipeline:

File Monitoring: Watchdog monitors journals/ directory
Change Detection: Detects create, modify, delete events
Text Extraction: Parses markdown into semantic chunks
Embedding Generation: Creates vector embeddings using all-MiniLM-L6-v2
Database Update: Stores in ChromaDB with metadata

RAG Retrieval:

Query Processing: User query → vector embedding
Similarity Search: ChromaDB finds most relevant chunks
Result Ranking: Sorts by cosine similarity
Context Delivery: Returns formatted results to Copilot

MCP Server:

Protocol Handler: Implements Model Context Protocol
Tool Endpoints: Exposes search and retrieval tools
Resource Provider: Makes knowledge available as resources
Async Communication: Efficient stdio-based communication

🐛 Troubleshooting

Service won't start

# Check logs
journalctl --user -u kernel-knowledge-indexer.service -n 50

# Check if port is in use
ps aux | grep indexer_service

MCP Server not responding

# Test directly
cd .kernel-knowledge
source venv/bin/activate
python3 mcp-server/server.py

No search results

# Check if files are indexed
./test-mcp-server.sh

# Manually trigger re-indexing
systemctl --user restart kernel-knowledge-indexer.service

Copilot can't find MCP server

Verify the path in VS Code settings
Check that Python path is correct
Ensure virtual environment is activated
Reload VS Code window

📊 Performance Considerations

Embedding Model: Uses lightweight all-MiniLM-L6-v2 (80MB)
Database: ChromaDB with SQLite backend
Indexing Speed: ~100-200 chunks/second
Search Latency: <100ms for most queries
Memory Usage: ~200-500MB depending on knowledge base size

🔒 Security Notes

All data is stored locally in your workspace
No external API calls for embeddings
No telemetry or data collection
ChromaDB runs entirely offline

🛠️ Advanced Configuration

Customize Embedding Model

Edit mcp-server/indexer_service.py and mcp-server/server.py:

EMBEDDING_MODEL = "your-model-name"

Options:

all-MiniLM-L6-v2 (default, fast, 384 dimensions)
all-mpnet-base-v2 (better quality, slower, 768 dimensions)
multi-qa-MiniLM-L6-cos-v1 (optimized for Q&A)

Adjust Chunk Size

Modify the chunking logic in extract_text_chunks() to change how documents are split.

Change Collection Name

Update COLLECTION_NAME in both service files to use a different database collection.

📝 Example Journal Files

See journals/README.md for examples and templates.

🤝 Contributing

This is a personal knowledge base system. Customize it to fit your workflow!

📜 License

MIT License - Feel free to modify and extend

🙏 Acknowledgments

ChromaDB: Vector database
Sentence Transformers: Embedding generation
Model Context Protocol: Copilot integration
Watchdog: File system monitoring

Happy Journaling! 📖🚀

For questions or issues, check the logs or test the system with ./test-mcp-server.sh

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
journals		journals
mcp-server		mcp-server
.gitignore		.gitignore
CHANGES.md		CHANGES.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
configure-vscode.sh		configure-vscode.sh
increase-inotify-limit.sh		increase-inotify-limit.sh
kernel-knowledge-indexer.service		kernel-knowledge-indexer.service
new-journal.sh		new-journal.sh
package.json		package.json
requirements.txt		requirements.txt
setup.sh		setup.sh
start-indexer.sh		start-indexer.sh
stop-indexer.sh		stop-indexer.sh

Folders and files

Latest commit

History

Repository files navigation

Kernel Knowledge RAG System

🎯 Features

🤖 How It Works

📁 Directory Structure

🚀 Quick Start

1. Installation

2. Start the Indexer Service

3. Configure GitHub Copilot

4. Start Journaling

💡 Using the Knowledge Base

Example Queries in Copilot Chat:

Available MCP Tools:

🔧 Management Commands

Check Service Status

View Logs

Stop the Service

Test the MCP Server

📚 Writing Effective Journal Entries

Best Practices:

Directory Organization:

Recommended Topics:

Example Structure:

🔍 How It Works

Indexing Pipeline:

RAG Retrieval:

MCP Server:

🐛 Troubleshooting

Service won't start

MCP Server not responding

No search results

Copilot can't find MCP server

📊 Performance Considerations

🔒 Security Notes

🛠️ Advanced Configuration

Customize Embedding Model

Adjust Chunk Size

Change Collection Name

📝 Example Journal Files

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages