A deep learning-based system for detecting deepfake images using a fine-tuned SWIN Transformer (Shifted Window Transformer). The model classifies facial images as Real or Fake with confidence scores and is deployed as a live web application.
B.Tech Major Project by Purna Chandar Konda
π€ View Trained Model on HuggingFace Hub β
- About
- How It Works
- Project Structure
- Installation
- Usage
- Model Training
- Results
- Live Demo
- Tech Stack
- License
- Acknowledgments
The rapid advancement of deepfake technology has made it increasingly difficult to distinguish real facial images from manipulated ones, posing a serious threat to digital media integrity, cybersecurity, and trust in online content.
This project addresses the problem by leveraging the SWIN Transformer architecture β a hierarchical vision transformer that uses shifted window-based self-attention β to detect and classify manipulated facial images with high accuracy.
Key highlights:
- Fine-tuned SWIN-Tiny model (
microsoft/swin-tiny-patch4-window7-224) pretrained on ImageNet-1K - Trained on 190,000+ real and fake face images from the Deepfake and Real Images dataset
- Deployed as a live Gradio web application on Hugging Face Spaces
- Includes a one-click Google Colab training notebook for easy reproducibility
βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββ
β Input Image βββββΆβ Preprocessing βββββΆβ SWIN TransformerβββββΆβ Classificationβ
β (224Γ224) β β (Resize, Norm) β β (Feature Ext.) β β Output β
βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β β
Real / β Fake β
β + Confidence % β
ββββββββββββββββββββ
Pipeline:
- Input β A facial image is uploaded through the web interface (any resolution).
- Preprocessing β The image is resized to 224Γ224 pixels and normalized using ImageNet mean/std values.
- Feature Extraction β The SWIN-Tiny transformer processes the image through 4 hierarchical stages using shifted window multi-head self-attention, extracting both local and global features.
- Classification β A linear classification head maps the extracted 768-dimensional feature vector to 2 output classes (Real / Fake) using softmax probabilities.
DeepfakeDetectionUsingSWINTransformer/
βββ app.py # Gradio web app (deployed on HF Spaces)
βββ demo.py # Quick local demo using HF pipeline
βββ deploy_to_spaces.py # One-click deployment script for HF Spaces
βββ train_on_colab.ipynb # Google Colab training notebook (recommended)
βββ swin-tiny-complete-training.py # Local training script (requires GPU)
βββ model-testing.py # Model evaluation script
βββ image_extractor.py # Frame extraction from video datasets
βββ models/
β βββ swin-tiny-complete/ # Model configuration files
β βββ config.json
β βββ preprocessor_config.json
βββ requirements.txt
βββ .gitignore
βββ LICENSE
βββ README.md
Note: Model weights (~110 MB) are hosted on Hugging Face Hub and are automatically downloaded when you run the app.
- Python 3.10+
- Git
- (Optional) NVIDIA GPU with CUDA for local training
# Clone the repository
git clone https://github.com/Purnachander-Konda/DeepfakeDetectionUsingSWINTransformer.git
cd DeepfakeDetectionUsingSWINTransformer
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtThe trained model weights are hosted on Hugging Face Hub and are downloaded automatically when you run app.py. For manual download:
pip install huggingface-hub
huggingface-cli download Purnachander-Konda/deepfake-detection-swin --local-dir ./models/swin-tiny-completepython app.pyOpen http://localhost:7860 in your browser β upload any face image to get a Real/Fake prediction with confidence scores.
python demo.pypython model-testing.pyThe fastest way to train β no local GPU needed:
- Open the notebook in Colab
- Set Runtime β GPU (T4)
- Run all cells β the notebook handles everything:
- Downloads the 190K+ image dataset
- Fine-tunes SWIN-Tiny for 3 epochs (~30-60 min)
- Evaluates and prints metrics
- Uploads the trained model to Hugging Face Hub
Requires an NVIDIA GPU and the dataset:
python swin-tiny-complete-training.py| Parameter | Value |
|---|---|
| Base Model | microsoft/swin-tiny-patch4-window7-224 |
| Dataset | Hemg/deepfake-and-real-images (190K+ images) |
| Train/Test Split | 80/20 (stratified) |
| Learning Rate | 2e-5 |
| Batch Size | 16 (Γ2 gradient accumulation = effective 32) |
| Epochs | 3 |
| Optimizer | AdamW (weight decay: 0.01) |
| Precision | FP16 mixed precision |
| Checkpointing | Gradient checkpointing enabled |
| Total Parameters | ~27.5M |
Evaluation metrics on the held-out test set (20% of dataset, ~38K images):
| Metric | Score |
|---|---|
| Accuracy | 0.9881 |
| F1 Score (Macro) | 0.9881 |
| Precision (Macro) | 0.9881 |
| Recall (Macro) | 0.9881 |
The model achieves strong performance on binary deepfake classification. Metrics were computed using HuggingFace Evaluate on the stratified test split.
The model is deployed as a Gradio web application on Hugging Face Spaces:
π https://huggingface.co/spaces/Purnachander-Konda/deepfake-detection-swin
Features:
- Upload any face image for instant Real/Fake classification
- Confidence scores for both classes
- No sign-up or installation required
| Component | Technology |
|---|---|
| Model | SWIN Transformer β Tiny variant |
| Framework | PyTorch + HuggingFace Transformers |
| Web Interface | Gradio |
| Dataset | Deepfake and Real Images β 190K+ images |
| Evaluation | HuggingFace Evaluate (Accuracy, F1, Precision, Recall) |
| Training | Google Colab β T4 GPU |
| Model Hosting | Hugging Face Hub |
| App Hosting | Hugging Face Spaces |
This project is licensed under the MIT License β see the LICENSE file for details.
- Microsoft Research for the SWIN Transformer architecture
- Hemg for the Deepfake and Real Images dataset
- Hugging Face for Transformers library, model hosting, and Spaces
- Gradio for the web interface framework
- Google Colab for free GPU access
Built by Purna Chandar Konda