clippod is an AI-powered podcast clipping platform that automatically extracts engaging moments from podcast videos and transforms them into vertical, social media-ready clips with subtitles.
- 🎬 Automatic Clip Generation: Uses AI to identify interesting moments and Q&A segments from podcast videos
- 🗣️ Active Speaker Detection: Advanced computer vision to focus on the speaking person
- 📱 Vertical Video Format: Converts clips to 1080x1920 for social media platforms
- 🎯 Smart Subtitles: Automatically generates stylized subtitles with word-level timing
- 🎨 Dynamic Framing: Intelligently crops video based on speaker location or creates cinematic backgrounds
- ⚡ GPU Acceleration: Powered by Modal's cloud infrastructure with NVIDIA L40S GPUs
- 🔐 Secure Processing: JWT authentication and AWS S3 integration for secure file handling
- Framework: FastAPI with Modal for serverless deployment
- AI Models:
- WhisperX for transcription and word-level alignment
- Google Gemini 2.5 Flash for content analysis
- Active Speaker Detection (ASD) model for speaker tracking
- Video Processing: FFmpeg with OpenCV for video manipulation
- Storage: AWS S3 for input/output video storage
- GPU: NVIDIA L40S for ML inference acceleration
- Framework: Next.js 15.2.3 with TypeScript
- UI: Tailwind CSS with Radix UI components
- Authentication: NextAuth.js with Prisma adapter
- Database: Prisma ORM
- Deployment: Optimized for Vercel
- Payment: Stripe integration
- Background Jobs: Inngest for async processing
- Python 3.12+
- Node.js 18+
- Docker (optional)
- AWS Account with S3 access
- Modal account for backend deployment
- Google AI API key for Gemini
-
Navigate to backend directory:
cd clippod-backend -
Create virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create Modal secrets with:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYGEMINI_API_KEYAUTH_TOKEN
-
Deploy to Modal:
modal deploy main.py
-
Navigate to frontend directory:
cd clippod-frontend -
Install dependencies:
npm install
-
Set up environment variables: Copy
.env.exampleto.envand configure:NEXTAUTH_SECRET=your-secret DATABASE_URL=your-database-url AWS_ACCESS_KEY_ID=your-aws-key AWS_SECRET_ACCESS_KEY=your-aws-secret STRIPE_SECRET_KEY=your-stripe-key # Add other required variables
-
Set up database:
npm run db:push
-
Start development server:
npm run dev
- Upload Video: Upload your podcast video through the web interface
- Processing: The AI analyzes the content and identifies clip-worthy moments
- Generation: Clips are automatically created with:
- Vertical format conversion
- Active speaker tracking
- Subtitle generation
- Smart cropping/background blurring
- Download: Access your processed clips from the dashboard
Process a video file stored in S3
Request Body:
{
"s3_key": "path/to/your/video.mp4"
}Headers:
Authorization: Bearer <your_auth_token>
Content-Type: application/json
- Clip Duration: 30-60 seconds (configurable)
- Max Words per Subtitle: 5 words
- Output Resolution: 1080x1920 (vertical)
- Framerate: 25 FPS
- Audio: AAC 128kbps
- Video Codec: H.264
- Transcription: WhisperX Large-v2
- Language: English (configurable)
- Compute Type: float16 for GPU optimization
- Content Analysis: Gemini 2.5 Flash
clippod/
├── clippod-backend/ # Python FastAPI backend
│ ├── main.py # Main application and Modal deployment
│ ├── ytdownload.py # YouTube download utilities
│ ├── requirements.txt # Python dependencies
│ └── asd/ # Active Speaker Detection model
│ ├── Columbia_test.py # ASD inference script
│ ├── model/ # Neural network models
│ └── weight/ # Pre-trained model weights
├── clippod-frontend/ # Next.js React frontend
│ ├── src/ # Source code
│ ├── components.json # UI component configuration
│ ├── package.json # Node.js dependencies
│ └── prisma/ # Database schema
└── README.md # This file