A pure MLX implementation of RVC for Apple Silicon, delivering 8.71x faster inference than PyTorch MPS.
- 8.71x faster full pipeline inference on real audio (13.5s)
- 1.82x faster RMVPE pitch detection (peak 2.10x on 30-60s audio)
- 10.6x realtime performance on 13.5s audio
- 0.986 spectrogram correlation - perceptually identical to PyTorch
- 17-40% better memory efficiency than PyTorch MPS
- Production-ready with full inference parity
This project is a fork of Applio. We chose to base this implementation on Applio to keep pace with the latest RVC developments, as they have become the primary maintainers since the original RVC project went dark.
Test Configuration: 13.5s audio, Drake model (RVCv2, 48kHz), Apple Silicon
| Metric | PyTorch MPS | MLX | Improvement |
|---|---|---|---|
| Inference Time | 11.08s | 1.27s | 8.71x faster |
| Realtime Factor | 1.22x | 10.6x | 8.7x better |
| Memory Usage | ~2.5GB | ~2.0GB | 20% less |
| Audio Quality | Baseline | 0.986 correlation | Identical |
Performance Comparison (13.5s audio)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PyTorch MPS ████████████████████████████████████████████ 11.08s
MLX █████ 1.27s
└─────────────────────────────────────────────┘
8.71x FASTER
| Audio Length | PyTorch MPS | MLX | Speedup | Realtime Factor |
|---|---|---|---|---|
| 5 seconds | 0.297s | 0.181s | 1.64x | 28x realtime |
| 30 seconds | 1.563s | 0.745s | 2.10x | 40x realtime |
| 60 seconds | 3.128s | 1.530s | 2.04x | 39x realtime |
| 3 minutes | 9.934s | 5.350s | 1.86x | 34x realtime |
| 5 minutes | 26.985s | 18.725s | 1.44x | 16x realtime |
| Average | - | - | 1.82x | 31x realtime |
RMVPE Speedup by Audio Length
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
5s ████████████████▌ 1.64x
30s █████████████████████ 2.10x ⭐ Peak Performance
60s ████████████████████▍ 2.04x
180s ██████████████████▌ 1.86x
300s ██████████████▍ 1.44x
└─────────────────────────────────────────────┘
0x 1x 2x 3x
| Component | PyTorch MPS | MLX | Speedup | Accuracy |
|---|---|---|---|---|
| TextEncoder | 5.31ms | 3.43ms | 1.55x | 1.000 correlation |
| RMVPE (5s) | 281.59ms | 173.43ms | 1.62x | 28.8x realtime |
| Full Pipeline | 11.08s | 1.27s | 8.71x | 0.986 spec. corr. |
Real Audio Test (13.5s):
- Spectrogram Correlation: 0.986 (perceptually identical)
- Waveform Correlation: 0.357 (expected due to phase drift)
- RMS Ratio: 0.994 (perfect gain match)
- Status: ✅ Production-ready
Key Insight: Low waveform correlation is expected and normal - it's due to accumulated floating-point differences causing phase drift in the sine generator. The high spectrogram correlation (0.986) proves the outputs are perceptually identical.
| Audio Length | PyTorch MPS | MLX | Savings |
|---|---|---|---|
| 5 seconds | ~600MB | ~500MB | 17% |
| 60 seconds | ~1.2GB | ~800MB | 33% |
| 5 minutes | ~2.5GB | ~1.5GB | 40% |
MLX's unified memory architecture provides significant memory savings, especially for longer audio.
All benchmarks performed on:
- Platform: MacBook Pro M3 Max (128GB RAM)
- OS: macOS Sequoia 15.2 (Darwin 25.2.0)
- Date: 2026-01-06
For detailed benchmark methodology and results:
- 📊 Comprehensive Benchmarks - Full performance analysis
- 📈 Benchmark Results - Detailed component testing
- ✅ Inference Parity - Accuracy validation
- 📖 Project Overview - Architecture and implementation
# Set required environment variable
export OMP_NUM_THREADS=1
# RMVPE benchmark (MLX vs PyTorch MPS)
python benchmarks/benchmark_rmvpe.py
# Component benchmarks (TextEncoder, RMVPE)
python benchmarks/benchmark_components.py
# Full pipeline audio parity test
python benchmarks/benchmark_audio_parity.pyThe project also includes a native Swift MLX implementation for iOS and macOS:
| Model | Correlation | Status |
|---|---|---|
| Drake | 92.9% | ✅ |
| Juice WRLD | 86.6% | ✅ |
| Eminem Modern | 94.4% | ✅ |
| Bob Marley | 93.5% | ✅ |
| Slim Shady | 91.9% | ✅ |
| Average | 91.8% | ✅ |
- Native MLX Swift with Metal GPU acceleration
- Full RVC pipeline: HuBERT → TextEncoder → Flow → Generator
- RMVPE pitch extraction (Default)
- FCPE, Crepe, Crepe-Tiny support (Python)
- Native FAISS Index Support (IVFFlat)
- On-device .pth → .safetensors conversion
- See:
Demos/iOS/andDemos/Mac/
The MLX implementation is production-ready and provides:
- ✅ 8.71x faster inference on real-world audio (Python MLX)
- ✅ 91.8% parity in Swift MLX (iOS/macOS native)
- ✅ Perceptually identical output to PyTorch
- ✅ Significantly better memory efficiency
- ✅ Native Apple Silicon optimization
- ✅ All components validated for numerical accuracy
Recommendation: Use MLX for all RVC inference on Apple Silicon.
The RVC CLI builds upon the foundations of the following projects:
Vocoders:
- HiFi-GAN by jik876
- Vocos by gemelo-ai
- BigVGAN by NVIDIA
- BigVSAN by sony
- vocoders by reppy4620
- vocoder by fishaudio
VC Clients:
- Retrieval-based-Voice-Conversion-WebUI by RVC-Project
- So-Vits-SVC by svc-develop-team
- Mangio-RVC-Fork by Mangio621
- VITS by jaywalnut310
- Harmonify by Eempostor
- rvc-trainer by thepowerfuldeez
Pitch Extractors:
- RMVPE by Dream-High
- torchfcpe by CNChTu
- torchcrepe by maxrmorrison
- anyf0 by SoulMelody
Other:
- FAIRSEQ by facebookresearch
- FAISS by facebookresearch
- ContentVec by auspicious3000
- audio-slicer by openvpi
- python-audio-separator by karaokenerds
- ultimatevocalremovergui by Anjok07
We acknowledge and appreciate the contributions of the respective authors and communities involved in these projects.