Skip to content

VasuDevrani/clippod

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

clippod

License: MIT Python Next.js TypeScript FastAPI

image image image

clippod is an AI-powered podcast clipping platform that automatically extracts engaging moments from podcast videos and transforms them into vertical, social media-ready clips with subtitles.

✨ Features

  • 🎬 Automatic Clip Generation: Uses AI to identify interesting moments and Q&A segments from podcast videos
  • 🗣️ Active Speaker Detection: Advanced computer vision to focus on the speaking person
  • 📱 Vertical Video Format: Converts clips to 1080x1920 for social media platforms
  • 🎯 Smart Subtitles: Automatically generates stylized subtitles with word-level timing
  • 🎨 Dynamic Framing: Intelligently crops video based on speaker location or creates cinematic backgrounds
  • GPU Acceleration: Powered by Modal's cloud infrastructure with NVIDIA L40S GPUs
  • 🔐 Secure Processing: JWT authentication and AWS S3 integration for secure file handling

🏗️ Architecture

Backend (clippod-backend/)

  • Framework: FastAPI with Modal for serverless deployment
  • AI Models:
    • WhisperX for transcription and word-level alignment
    • Google Gemini 2.5 Flash for content analysis
    • Active Speaker Detection (ASD) model for speaker tracking
  • Video Processing: FFmpeg with OpenCV for video manipulation
  • Storage: AWS S3 for input/output video storage
  • GPU: NVIDIA L40S for ML inference acceleration

Frontend (clippod-frontend/)

  • Framework: Next.js 15.2.3 with TypeScript
  • UI: Tailwind CSS with Radix UI components
  • Authentication: NextAuth.js with Prisma adapter
  • Database: Prisma ORM
  • Deployment: Optimized for Vercel
  • Payment: Stripe integration
  • Background Jobs: Inngest for async processing

🚀 Getting Started

Prerequisites

  • Python 3.12+
  • Node.js 18+
  • Docker (optional)
  • AWS Account with S3 access
  • Modal account for backend deployment
  • Google AI API key for Gemini

Backend Setup

  1. Navigate to backend directory:

    cd clippod-backend
  2. Create virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up environment variables: Create Modal secrets with:

    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • GEMINI_API_KEY
    • AUTH_TOKEN
  5. Deploy to Modal:

    modal deploy main.py

Frontend Setup

  1. Navigate to frontend directory:

    cd clippod-frontend
  2. Install dependencies:

    npm install
  3. Set up environment variables: Copy .env.example to .env and configure:

    NEXTAUTH_SECRET=your-secret
    DATABASE_URL=your-database-url
    AWS_ACCESS_KEY_ID=your-aws-key
    AWS_SECRET_ACCESS_KEY=your-aws-secret
    STRIPE_SECRET_KEY=your-stripe-key
    # Add other required variables
  4. Set up database:

    npm run db:push
  5. Start development server:

    npm run dev

🎯 Usage

  1. Upload Video: Upload your podcast video through the web interface
  2. Processing: The AI analyzes the content and identifies clip-worthy moments
  3. Generation: Clips are automatically created with:
    • Vertical format conversion
    • Active speaker tracking
    • Subtitle generation
    • Smart cropping/background blurring
  4. Download: Access your processed clips from the dashboard

🛠️ API Endpoints

POST /process_video

Process a video file stored in S3

Request Body:

{
  "s3_key": "path/to/your/video.mp4"
}

Headers:

Authorization: Bearer <your_auth_token>
Content-Type: application/json

🔧 Configuration

Video Processing Settings

  • Clip Duration: 30-60 seconds (configurable)
  • Max Words per Subtitle: 5 words
  • Output Resolution: 1080x1920 (vertical)
  • Framerate: 25 FPS
  • Audio: AAC 128kbps
  • Video Codec: H.264

AI Model Configuration

  • Transcription: WhisperX Large-v2
  • Language: English (configurable)
  • Compute Type: float16 for GPU optimization
  • Content Analysis: Gemini 2.5 Flash

📁 Project Structure

clippod/
├── clippod-backend/           # Python FastAPI backend
│   ├── main.py               # Main application and Modal deployment
│   ├── ytdownload.py         # YouTube download utilities
│   ├── requirements.txt      # Python dependencies
│   └── asd/                  # Active Speaker Detection model
│       ├── Columbia_test.py  # ASD inference script
│       ├── model/           # Neural network models
│       └── weight/          # Pre-trained model weights
├── clippod-frontend/         # Next.js React frontend
│   ├── src/                 # Source code
│   ├── components.json      # UI component configuration
│   ├── package.json         # Node.js dependencies
│   └── prisma/             # Database schema
└── README.md               # This file

About

clip podcasts

Resources

Stars

Watchers

Forks