TranscriptMCP is a Model Context Protocol (MCP) server that enables AI assistants to download YouTube videos and transcribe them using OpenAI's Whisper speech recognition model.
This server can be used:
- Standalone β Run as a Python script and interact via MCP protocol
- With OpenClaw β Integrated as an MCP server for AI assistants
- π₯ Download YouTube Audio β Extract audio from any YouTube video
- ποΈ Transcribe with Whisper β Local, free transcription using Faster Whisper
- π Multiple Output Formats β Full transcript with timestamps or plain text
- πΎ Save Files β Optionally save the downloaded MP3 and transcript files
- π§ MCP Compatible β Works with any MCP-compliant AI assistant
- π 100% Free β No API keys required (uses local Whisper model)
By default, the server saves:
- Audio:
{video_id}.mp3in the temp directory - Transcript:
{video_id}.txtin the workspace directory
You can customize where files are saved by modifying the server code.
- Python 3.11 or higher
ffmpeg(for audio processing)
mcpβ Model Context Protocol serveryt-dlpβ YouTube downloaderfaster-whisperβ Lightning-fast Whisper transcription (recommended)
brew install ffmpegsudo apt update
sudo apt install ffmpegDownload ffmpeg from https://ffmpeg.org/download.html and add to PATH.
git clone https://github.com/The-TechLab/TranscriptMCP.git
cd TranscriptMCPpip install -e .Or with uv:
uv syncThe first time you run the server, it will automatically download the Whisper model. By default, it uses the base model (~140MB).
To use a different model, set the environment variable:
export WHISPER_MODEL=medium # Options: tiny, base, small, medium, largeRun the server:
python -m transcript_mcp.serverThe server communicates via stdin/stdout using the MCP protocol. Connect it to any MCP-compatible AI assistant.
Add to your OpenClaw MCP configuration (~/.openclaw/workspace/config/mcporter.json):
{
"mcpServers": {
"transcript": {
"command": "python",
"args": [
"-m",
"transcript_mcp.server"
],
"env": {
"WHISPER_MODEL": "base"
}
}
}
}Then restart OpenClaw.
Test the server directly:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' | python -m transcript_mcp.serverGet metadata about a YouTube video without downloading.
Parameters:
url(required): YouTube video URL
Returns: Title, description, duration, channel, view count, upload date
Download and transcribe a YouTube video with timestamps.
Parameters:
url(required): YouTube video URLlanguage(optional): Language code (e.g., "en"). Auto-detected if not specified.
Returns: Full transcript with timestamps for each segment
Download and transcribe a YouTube video as plain text.
Parameters:
url(required): YouTube video URLlanguage(optional): Language code (e.g., "en"). Auto-detected if not specified.
Returns: Plain text transcript without timestamps
| Variable | Default | Description |
|---|---|---|
WHISPER_MODEL |
base |
Whisper model size: tiny, base, small, medium, large |
TEMP_DIR |
/tmp |
Directory for temporary audio files |
| Model | Parameters | Size | Relative Speed |
|---|---|---|---|
| tiny | 39M | 75MB | ~10x |
| base | 74M | 140MB | ~7x |
| small | 244M | 480MB | ~4x |
| medium | 769M | 1.5GB | ~2x |
| large | 1550M | 3GB | 1x |
Recommendation: Start with base for a good balance of speed and accuracy.
Once configured, you can ask your AI:
"Can you transcribe this video and give me a summary? https://youtube.com/watch?v=xxxxx"
The AI will:
- Download the audio
- Transcribe using Whisper
- Return the transcript or summary
Install ffmpeg (see Installation step 1)
The first run will automatically download the model. If it fails, manually run:
whisper --model base- Check the YouTube URL is correct
- Video may be private or region-locked
- Try updating yt-dlp:
pip install -U yt-dlp
TranscriptMCP/
βββ transcript_mcp/
β βββ __init__.py
β βββ server.py # Main MCP server
βββ LICENSE
βββ README.md
βββ pyproject.toml
βββ uv.lock
pytest- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License β See LICENSE for details.
Developed by The Tech Lab
Empowering AI with local, private transcription capabilities.