A friendly, step-by-step tutorial for Windows (works on macOS/Linux too) — how to install Whisper, ffmpeg, Python & PyTorch, run in terminal, and use it offline.
What is Whisper?
Whisper is an automatic speech recognition (ASR) and speech-translation model from OpenAI. It can:
- Transcribe audio/video to text (many languages).
- Translate speech into English (optional).
- Run locally once the model is downloaded — great for privacy.
Whisper comes in multiple sizes (tiny, base, small, medium, large[–v2/v3]). Smaller = faster, larger = more accurate.
System requirements (basic)
| Component | Recommendation |
|---|---|
| CPU | Any modern multi-core CPU (i5/i7/i9 or Ryzen equivalent). Small models run fine on CPU. |
| GPU | NVIDIA CUDA GPU recommended for medium/large models. RTX 30/40/50 series works well. |
| RAM | 8 GB minimum; 16+ GB recommended for medium/large models. |
| Disk space | Model files can be 100MB → several GB depending on model. Store models on a drive with free space (e.g., E:). |
| OS | Windows 10/11, macOS, Linux — Python & ffmpeg must be installed. |
Install — step by step (Windows)
1. Install Python
Download the installer and choose "Add Python to PATH" on the first screen.
Official download: python.org/downloads
2. Install FFmpeg
Download a prebuilt Windows build, extract it and add .../ffmpeg/bin to your PATH.
FFmpeg builds and instructions: ffmpeg.org/download.html
3. Install Whisper (Python package)
pip install -U openai-whisper
If that fails, install directly from GitHub:
pip install git+https://github.com/openai/whisper.git
4. Install PyTorch (CPU or GPU)
Go to the PyTorch site and pick the right command for your system.
PyTorch installer selector: pytorch.org/get-started/locally
Example (CUDA 12.1 + pip):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
If you don't have an NVIDIA GPU or prefer CPU only:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
5. Verify
python --version
pip --version
ffmpeg -version
whisper --help
If all commands run, you're ready.
Whisper terminal commands — explained
Basic transcription (force language Hindi):
whisper "E:\tools\instalag-1.mp4" --model small --language hi --output_format txt
Translate audio to English:
whisper "myvideo.mp4" --model small --task translate --output_format txt
Use GPU:
whisper "myvideo.mp4" --model medium --device cuda
Save subtitles (.srt):
whisper "myvideo.mp4" --model small --output_format srt --language hi
Common CLI options:
--model: tiny | base | small | medium | large--language: language code e.g.hi(Hindi),en(English) orauto--task: transcribe or translate--device: cpu or cuda--output_format: txt, srt, vtt, json--output_dir: folder to save output files
When you run a model the first time, Whisper downloads the model file and caches it locally — after that it runs offline.
Offline models (store them on another drive)
If your C: drive is full, you can store model files anywhere (e.g., E:\tools\whisper_cache). Put the model .pt files in that folder and tell Whisper to use it.
Download links (model files / sources)
- Whisper repo (package & docs): github.com/openai/whisper
- Whisper model files (Hugging Face/OpenAI releases) — example model pages: search "openai whisper small pt" on Hugging Face or use the repo links above.
- Standalone Windows GUI/executables (community projects): whisper-standalone-win (example)
How to set the local cache folder in code
import whisper
model = whisper.load_model("small", download_root=r"E:\tools\whisper_cache")
Manual file layout (recommended)
E:\tools\Transcriber\
├─ ffmpeg\bin\ffmpeg.exe
├─ whisper_cache\ <-- ...="" app.py="" base.pt="" code="" here="" models="" put="" small.pt="">-->
If offline, ensure the correct model filename is present (e.g., small.pt). Whisper will load from that folder and not try to download.
Extras, tips & troubleshooting
Common issues
- pip not found: Add Python to PATH during install or add Python install directory to PATH manually.
- ffmpeg not found: Ensure
.../ffmpeg/binis in PATH or use full path in scripts. - PyTorch CUDA problems: Install the correct CUDA wheel matching your GPU and driver.
- Model download fails: Download manually on a networked machine and copy to your cache folder.
Batch convert all MP4s (PowerShell one-liner)
Get-ChildItem -Path "E:\tools" -Filter *.mp4 | ForEach-Object {
whisper $_.FullName --model small --language hi
}
Chunk long files (recommended for long recordings)
Split audio into chunks (e.g., 30s) and transcribe piecewise — reduces memory pressure and improves reliability for long sessions.
Useful downloads & pages
Quick start cheat sheet
# 1) Verify tools
python --version
ffmpeg -version
# 2) Install whisper
pip install -U openai-whisper
# 3) Transcribe (Hindi)
whisper "E:\tools\instalag-1.mp4" --model small --language hi --output_format txt
# 4) Translate to English
whisper "myfile.mp4" --model small --task translate --output_format txt
# 5) Use cached models from E: drive
python -c "import whisper; whisper.load_model('small', download_root=r'E:\tools\whisper_cache')"
