Skip to content

sprasad-microsoft/kernel-knowledge-mcp

Repository files navigation

Kernel Knowledge RAG System

An intelligent knowledge management system for Linux kernel development that automatically indexes your documentation and provides context-aware code assistance through GitHub Copilot.

🎯 Features

  • πŸ“ Active Knowledge Journaling: Copilot automatically maintains journal files as you share knowledge
  • πŸ” Automatic Indexing: File monitoring service detects changes and updates vector database
  • πŸ€– RAG-Powered Retrieval: Semantic search using sentence transformers
  • πŸ”Œ MCP Server Integration: Seamlessly integrates with GitHub Copilot via Model Context Protocol
  • ⚑ Real-time Updates: Changes are indexed within seconds
  • πŸ—‚οΈ Hierarchical Organization: Mirrors kernel source tree structure for context-aware suggestions
  • πŸ—„οΈ Vector Database: ChromaDB for efficient similarity search

πŸ€– How It Works

As you work with GitHub Copilot and share kernel knowledge during your coding sessions, Copilot will:

  1. Listen for knowledge sharing - When you explain concepts, gotchas, or implementation details
  2. Extract key information - File paths, function names, explanations, common issues
  3. Update journal files - Automatically creates or updates the appropriate markdown file in the correct subsystem directory
  4. Auto-index - Changes are detected and indexed within seconds for immediate searchability

You just code and share knowledge naturally - Copilot handles the documentation!

Of course, you can also manually edit any journal file anytime.

πŸ“ Directory Structure

.kernel-knowledge/
β”œβ”€β”€ journals/              # Markdown knowledge files (mirrors kernel structure)
β”‚   β”œβ”€β”€ fs/smb/client/    # SMB client knowledge
β”‚   β”œβ”€β”€ fs/smb/server/    # Kernel SMB server knowledge
β”‚   β”œβ”€β”€ net/              # Networking subsystem
β”‚   β”œβ”€β”€ mm/               # Memory management
β”‚   β”œβ”€β”€ kernel/           # Core kernel
β”‚   └── drivers/          # Device drivers
β”œβ”€β”€ vector-db/            # ChromaDB database (auto-generated)
β”œβ”€β”€ mcp-server/           # MCP server and indexer service
β”‚   β”œβ”€β”€ indexer_service.py
β”‚   └── server.py
β”œβ”€β”€ venv/                 # Python virtual environment
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ setup.sh             # Installation script
β”œβ”€β”€ start-indexer.sh     # Start the indexing service
β”œβ”€β”€ stop-indexer.sh      # Stop the indexing service
└── test-mcp-server.sh   # Test the MCP server

πŸš€ Quick Start

1. Installation

Run the setup script to install all dependencies:

cd .kernel-knowledge
chmod +x setup.sh
./setup.sh

This will:

  • Create a Python virtual environment
  • Install required packages (ChromaDB, sentence-transformers, MCP, etc.)
  • Download the embedding model
  • Set up the systemd user service
  • Make all scripts executable

2. Start the Indexer Service

The indexer will automatically start when you open the workspace in VS Code (via the workspace task).

You can also start it manually:

./start-indexer.sh

The indexer service will:

  • Run in the background as a systemd user service
  • Recursively monitor the journals/ directory for changes
  • Automatically index new or modified markdown files
  • Update the vector database in real-time

3. Configure GitHub Copilot

Add the MCP server to your VS Code settings:

  1. Open VS Code Settings (JSON)
  2. Add the following configuration:
{
  "github.copilot.chat.mcp.servers": {
    "kernel-knowledge": {
      "command": "python3",
      "args": [
        "/home/sprasad/repo/sprasad-microsoft/smb3-kernel-client/.kernel-knowledge/mcp-server/server.py"
      ]
    }
  }
}
  1. Restart VS Code or reload the window

4. Start Journaling

Create markdown files in the appropriate subsystem directory under journals/:

cd journals/fs/smb/client
vim oplock-handling.md

Example journal entry:

# SMB3 Oplock Handling

## Overview

Oplocks (opportunistic locks) allow SMB clients to cache file data...

## Implementation

The oplock logic is implemented in fs/smb/client/smb2ops.c...

## Common Issues

- Oplock break delays
- Client cache invalidation

The file will be automatically indexed within seconds!

πŸ’‘ Using the Knowledge Base

Once configured, GitHub Copilot can access your kernel knowledge automatically:

Example Queries in Copilot Chat:

  1. Search for specific topics:

    @kernel-knowledge What do we know about SMB3 directory leasing?
    
  2. Get context for code changes:

    I need to modify the VFS caching layer. What context do we have?
    
  3. List available knowledge:

    @kernel-knowledge What topics have been documented?
    

Available MCP Tools:

The MCP server provides these tools to Copilot:

  • search_kernel_knowledge: Semantic search across all journals
  • search_by_subsystem: Search within a specific kernel subsystem path (e.g., "fs/smb/client")
  • list_knowledge_files: List all indexed files with their paths
  • get_knowledge_stats: Get statistics about the knowledge base
  • read_knowledge_file: Read complete content of a specific file by path

πŸ”§ Management Commands

Check Service Status

systemctl --user status kernel-knowledge-indexer.service

View Logs

journalctl --user -u kernel-knowledge-indexer.service -f

Stop the Service

./stop-indexer.sh

Test the MCP Server

./test-mcp-server.sh

πŸ“š Writing Effective Journal Entries

Best Practices:

  1. Follow Kernel Structure: Place files in the directory matching the kernel subsystem
  2. Use Clear Headings: Organize content with markdown headings
  3. Be Specific: Include file paths, function names, and line numbers
  4. Add Context: Explain why, not just what
  5. Link Concepts: Reference related subsystems and files
  6. Include Examples: Add code snippets and command outputs

Directory Organization:

Match your journal structure to the kernel source:

  • fs/smb/client/: SMB client implementation knowledge
  • fs/smb/server/: Kernel SMB server (ksmbd) knowledge
  • net/: Networking subsystem topics
  • mm/: Memory management topics
  • kernel/: Core kernel functionality
  • drivers/: Device driver knowledge

Recommended Topics:

  • Protocol implementations (SMB3, NFS, CIFS)
  • Kernel subsystems (VFS, memory management, networking)
  • Debugging techniques and root cause analyses
  • Performance optimization strategies
  • Bug fixes and their explanations
  • Testing procedures and gotchas
  • Build system quirks
  • Hardware-specific issues

Example Structure:

# Topic Title

## Overview
Brief introduction to the topic

## Background
Historical context, why it exists

## Implementation
Where in the code it lives, key functions

## Related Files
- fs/smb/client/file1.c - Description
- fs/smb/client/file2.h - Description

## Common Issues
Known problems and solutions

## Testing
How to test this functionality

## Related Topics
Links to other relevant concepts

## References
Commits, patches, documentation links

πŸ” How It Works

Indexing Pipeline:

  1. File Monitoring: Watchdog monitors journals/ directory
  2. Change Detection: Detects create, modify, delete events
  3. Text Extraction: Parses markdown into semantic chunks
  4. Embedding Generation: Creates vector embeddings using all-MiniLM-L6-v2
  5. Database Update: Stores in ChromaDB with metadata

RAG Retrieval:

  1. Query Processing: User query β†’ vector embedding
  2. Similarity Search: ChromaDB finds most relevant chunks
  3. Result Ranking: Sorts by cosine similarity
  4. Context Delivery: Returns formatted results to Copilot

MCP Server:

  1. Protocol Handler: Implements Model Context Protocol
  2. Tool Endpoints: Exposes search and retrieval tools
  3. Resource Provider: Makes knowledge available as resources
  4. Async Communication: Efficient stdio-based communication

πŸ› Troubleshooting

Service won't start

# Check logs
journalctl --user -u kernel-knowledge-indexer.service -n 50

# Check if port is in use
ps aux | grep indexer_service

MCP Server not responding

# Test directly
cd .kernel-knowledge
source venv/bin/activate
python3 mcp-server/server.py

No search results

# Check if files are indexed
./test-mcp-server.sh

# Manually trigger re-indexing
systemctl --user restart kernel-knowledge-indexer.service

Copilot can't find MCP server

  1. Verify the path in VS Code settings
  2. Check that Python path is correct
  3. Ensure virtual environment is activated
  4. Reload VS Code window

πŸ“Š Performance Considerations

  • Embedding Model: Uses lightweight all-MiniLM-L6-v2 (80MB)
  • Database: ChromaDB with SQLite backend
  • Indexing Speed: ~100-200 chunks/second
  • Search Latency: <100ms for most queries
  • Memory Usage: ~200-500MB depending on knowledge base size

πŸ”’ Security Notes

  • All data is stored locally in your workspace
  • No external API calls for embeddings
  • No telemetry or data collection
  • ChromaDB runs entirely offline

πŸ› οΈ Advanced Configuration

Customize Embedding Model

Edit mcp-server/indexer_service.py and mcp-server/server.py:

EMBEDDING_MODEL = "your-model-name"

Options:

  • all-MiniLM-L6-v2 (default, fast, 384 dimensions)
  • all-mpnet-base-v2 (better quality, slower, 768 dimensions)
  • multi-qa-MiniLM-L6-cos-v1 (optimized for Q&A)

Adjust Chunk Size

Modify the chunking logic in extract_text_chunks() to change how documents are split.

Change Collection Name

Update COLLECTION_NAME in both service files to use a different database collection.

πŸ“ Example Journal Files

See journals/README.md for examples and templates.

🀝 Contributing

This is a personal knowledge base system. Customize it to fit your workflow!

πŸ“œ License

MIT License - Feel free to modify and extend

πŸ™ Acknowledgments

  • ChromaDB: Vector database
  • Sentence Transformers: Embedding generation
  • Model Context Protocol: Copilot integration
  • Watchdog: File system monitoring

Happy Journaling! πŸ“–πŸš€

For questions or issues, check the logs or test the system with ./test-mcp-server.sh

About

This repository contains a knowledgebase of all the kernel domain knowledge that I've imparted to LLMs. The KB articles are written as markdown files. There is also an indexer service that runs a daemon which watches any updates to KB articles and populates the vectors. This repo also contains an MCP server that can be used to retrieve RAG indexes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors