Skip to content

Purnachander-Konda/DeepfakeDetectionUsingSWINTransformer

Repository files navigation

πŸ” Deepfake Detection Using SWIN Transformer

Python PyTorch HuggingFace Gradio License

A deep learning-based system for detecting deepfake images using a fine-tuned SWIN Transformer (Shifted Window Transformer). The model classifies facial images as Real or Fake with confidence scores and is deployed as a live web application.

B.Tech Major Project by Purna Chandar Konda

πŸš€ Try the Live Demo β†’

πŸ€— View Trained Model on HuggingFace Hub β†’


πŸ“‹ Table of Contents


🧠 About

The rapid advancement of deepfake technology has made it increasingly difficult to distinguish real facial images from manipulated ones, posing a serious threat to digital media integrity, cybersecurity, and trust in online content.

This project addresses the problem by leveraging the SWIN Transformer architecture β€” a hierarchical vision transformer that uses shifted window-based self-attention β€” to detect and classify manipulated facial images with high accuracy.

Key highlights:

  • Fine-tuned SWIN-Tiny model (microsoft/swin-tiny-patch4-window7-224) pretrained on ImageNet-1K
  • Trained on 190,000+ real and fake face images from the Deepfake and Real Images dataset
  • Deployed as a live Gradio web application on Hugging Face Spaces
  • Includes a one-click Google Colab training notebook for easy reproducibility

βš™οΈ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Input Image │───▢│  Preprocessing    │───▢│ SWIN Transformer│───▢│ Classificationβ”‚
β”‚  (224Γ—224)   β”‚    β”‚  (Resize, Norm)   β”‚    β”‚ (Feature Ext.)  β”‚    β”‚    Output    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                                        β”‚
                                                                        β–Ό
                                                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                              β”‚  βœ… Real / ❌ Fake β”‚
                                                              β”‚  + Confidence %   β”‚
                                                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pipeline:

  1. Input β€” A facial image is uploaded through the web interface (any resolution).
  2. Preprocessing β€” The image is resized to 224Γ—224 pixels and normalized using ImageNet mean/std values.
  3. Feature Extraction β€” The SWIN-Tiny transformer processes the image through 4 hierarchical stages using shifted window multi-head self-attention, extracting both local and global features.
  4. Classification β€” A linear classification head maps the extracted 768-dimensional feature vector to 2 output classes (Real / Fake) using softmax probabilities.

πŸ“ Project Structure

DeepfakeDetectionUsingSWINTransformer/
β”œβ”€β”€ app.py                          # Gradio web app (deployed on HF Spaces)
β”œβ”€β”€ demo.py                         # Quick local demo using HF pipeline
β”œβ”€β”€ deploy_to_spaces.py             # One-click deployment script for HF Spaces
β”œβ”€β”€ train_on_colab.ipynb            # Google Colab training notebook (recommended)
β”œβ”€β”€ swin-tiny-complete-training.py  # Local training script (requires GPU)
β”œβ”€β”€ model-testing.py                # Model evaluation script
β”œβ”€β”€ image_extractor.py              # Frame extraction from video datasets
β”œβ”€β”€ models/
β”‚   └── swin-tiny-complete/         # Model configuration files
β”‚       β”œβ”€β”€ config.json
β”‚       └── preprocessor_config.json
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .gitignore
β”œβ”€β”€ LICENSE
└── README.md

Note: Model weights (~110 MB) are hosted on Hugging Face Hub and are automatically downloaded when you run the app.


πŸš€ Installation

Prerequisites

  • Python 3.10+
  • Git
  • (Optional) NVIDIA GPU with CUDA for local training

Setup

# Clone the repository
git clone https://github.com/Purnachander-Konda/DeepfakeDetectionUsingSWINTransformer.git
cd DeepfakeDetectionUsingSWINTransformer

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate        # Linux/Mac
venv\Scripts\activate           # Windows

# Install dependencies
pip install -r requirements.txt

Model Weights

The trained model weights are hosted on Hugging Face Hub and are downloaded automatically when you run app.py. For manual download:

pip install huggingface-hub
huggingface-cli download Purnachander-Konda/deepfake-detection-swin --local-dir ./models/swin-tiny-complete

πŸ’» Usage

Run the Web App

python app.py

Open http://localhost:7860 in your browser β€” upload any face image to get a Real/Fake prediction with confidence scores.

Quick Demo

python demo.py

Evaluate the Model

python model-testing.py

πŸ‹οΈ Model Training

Train on Google Colab (Recommended)

The fastest way to train β€” no local GPU needed:

Open In Colab

  1. Open the notebook in Colab
  2. Set Runtime β†’ GPU (T4)
  3. Run all cells β€” the notebook handles everything:
    • Downloads the 190K+ image dataset
    • Fine-tunes SWIN-Tiny for 3 epochs (~30-60 min)
    • Evaluates and prints metrics
    • Uploads the trained model to Hugging Face Hub

Train Locally

Requires an NVIDIA GPU and the dataset:

python swin-tiny-complete-training.py

Training Configuration

Parameter Value
Base Model microsoft/swin-tiny-patch4-window7-224
Dataset Hemg/deepfake-and-real-images (190K+ images)
Train/Test Split 80/20 (stratified)
Learning Rate 2e-5
Batch Size 16 (Γ—2 gradient accumulation = effective 32)
Epochs 3
Optimizer AdamW (weight decay: 0.01)
Precision FP16 mixed precision
Checkpointing Gradient checkpointing enabled
Total Parameters ~27.5M

πŸ“Š Results

Evaluation metrics on the held-out test set (20% of dataset, ~38K images):

Metric Score
Accuracy 0.9881
F1 Score (Macro) 0.9881
Precision (Macro) 0.9881
Recall (Macro) 0.9881

The model achieves strong performance on binary deepfake classification. Metrics were computed using HuggingFace Evaluate on the stratified test split.


🌐 Live Demo

The model is deployed as a Gradio web application on Hugging Face Spaces:

πŸ”— https://huggingface.co/spaces/Purnachander-Konda/deepfake-detection-swin

Features:

  • Upload any face image for instant Real/Fake classification
  • Confidence scores for both classes
  • No sign-up or installation required

πŸ› οΈ Tech Stack

Component Technology
Model SWIN Transformer β€” Tiny variant
Framework PyTorch + HuggingFace Transformers
Web Interface Gradio
Dataset Deepfake and Real Images β€” 190K+ images
Evaluation HuggingFace Evaluate (Accuracy, F1, Precision, Recall)
Training Google Colab β€” T4 GPU
Model Hosting Hugging Face Hub
App Hosting Hugging Face Spaces

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.


πŸ™ Acknowledgments


Built by Purna Chandar Konda