Python scripts and experiments used for my research paper on speech-to-text AI models. This repository includes model evaluations, accuracy benchmarks, and transcript quality comparisons using various open-source and cloud-based STT APIs.
> Place your audio file in the same directory and update `audio_path` in the scripts.
Follow these steps to set up and run the Whisper transcription script:
Make sure you have Python 3.10 installed on your machine.
⚠️ Whisper requires Python 3.10 due to dependency compatibility. Using other versions (like 3.11 or 3.12) may cause installation issues or runtime errors.
python3.10 -m venv whisper-venv
source whisper-venv/bin/activate
pip install openai-whisper
python3 stt-whisper-model.py
deactivate
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 vosk-model
This downloads and extracts the model to a folder named vosk-model which the script expects.
python3.10 -m venv vosk-venv
source vosk-venv/bin/activate
pip install vosk
python3 stt-vosk-model.py path/to/your/audio.wav
Replace path/to/your/audio.wav with the path to your WAV file (mono, 16kHz).
After running, two files will be generated in the script directory:
transcription.json — full transcription plus word-level timestamps
transcription.txt — full transcription as plain text
deactivate
python3.10 -m venv assemblyai-venv
source assemblyai-venv/bin/activate
pip install requests, dotenv
python3 stt-assemblyai-model.py
deactivate