Convert Video to Text Using Whisper | Full Setup & Step-by-Step Guide


A friendly, step-by-step tutorial for Windows (works on macOS/Linux too) — how to install Whisper, ffmpeg, Python & PyTorch, run in terminal, and use it offline.

What is Whisper?

Whisper is an automatic speech recognition (ASR) and speech-translation model from OpenAI. It can:

  • Transcribe audio/video to text (many languages).
  • Translate speech into English (optional).
  • Run locally once the model is downloaded — great for privacy.

Whisper comes in multiple sizes (tiny, base, small, medium, large[–v2/v3]). Smaller = faster, larger = more accurate.

System requirements (basic)

ComponentRecommendation
CPUAny modern multi-core CPU (i5/i7/i9 or Ryzen equivalent). Small models run fine on CPU.
GPUNVIDIA CUDA GPU recommended for medium/large models. RTX 30/40/50 series works well.
RAM8 GB minimum; 16+ GB recommended for medium/large models.
Disk spaceModel files can be 100MB → several GB depending on model. Store models on a drive with free space (e.g., E:).
OSWindows 10/11, macOS, Linux — Python & ffmpeg must be installed.

Install — step by step (Windows)

1. Install Python

Download the installer and choose "Add Python to PATH" on the first screen.

2. Install FFmpeg

Download a prebuilt Windows build, extract it and add .../ffmpeg/bin to your PATH.

3. Install Whisper (Python package)

pip install -U openai-whisper

If that fails, install directly from GitHub:

pip install git+https://github.com/openai/whisper.git

4. Install PyTorch (CPU or GPU)

Go to the PyTorch site and pick the right command for your system.

Example (CUDA 12.1 + pip):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

If you don't have an NVIDIA GPU or prefer CPU only:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

5. Verify

python --version
pip --version
ffmpeg -version
whisper --help

If all commands run, you're ready.

Whisper terminal commands — explained

Basic transcription (force language Hindi):

whisper "E:\tools\instalag-1.mp4" --model small --language hi --output_format txt

Translate audio to English:

whisper "myvideo.mp4" --model small --task translate --output_format txt

Use GPU:

whisper "myvideo.mp4" --model medium --device cuda

Save subtitles (.srt):

whisper "myvideo.mp4" --model small --output_format srt --language hi

Common CLI options:

  • --model : tiny | base | small | medium | large
  • --language : language code e.g. hi (Hindi), en (English) or auto
  • --task : transcribe or translate
  • --device : cpu or cuda
  • --output_format : txt, srt, vtt, json
  • --output_dir : folder to save output files

When you run a model the first time, Whisper downloads the model file and caches it locally — after that it runs offline.

Offline models (store them on another drive)

If your C: drive is full, you can store model files anywhere (e.g., E:\tools\whisper_cache). Put the model .pt files in that folder and tell Whisper to use it.

Download links (model files / sources)

  • Whisper repo (package & docs): github.com/openai/whisper
  • Whisper model files (Hugging Face/OpenAI releases) — example model pages: search "openai whisper small pt" on Hugging Face or use the repo links above.
  • Standalone Windows GUI/executables (community projects): whisper-standalone-win (example)

How to set the local cache folder in code

import whisper
model = whisper.load_model("small", download_root=r"E:\tools\whisper_cache")

Manual file layout (recommended)

E:\tools\Transcriber\
  ├─ ffmpeg\bin\ffmpeg.exe
  ├─ whisper_cache\   <-- ...="" app.py="" base.pt="" code="" here="" models="" put="" small.pt="">

If offline, ensure the correct model filename is present (e.g., small.pt). Whisper will load from that folder and not try to download.

Extras, tips & troubleshooting

Common issues

  • pip not found: Add Python to PATH during install or add Python install directory to PATH manually.
  • ffmpeg not found: Ensure .../ffmpeg/bin is in PATH or use full path in scripts.
  • PyTorch CUDA problems: Install the correct CUDA wheel matching your GPU and driver.
  • Model download fails: Download manually on a networked machine and copy to your cache folder.

Batch convert all MP4s (PowerShell one-liner)

Get-ChildItem -Path "E:\tools" -Filter *.mp4 | ForEach-Object {
  whisper $_.FullName --model small --language hi
}

Chunk long files (recommended for long recordings)

Split audio into chunks (e.g., 30s) and transcribe piecewise — reduces memory pressure and improves reliability for long sessions.

Useful downloads & pages

Quick start cheat sheet

# 1) Verify tools
python --version
ffmpeg -version

# 2) Install whisper
pip install -U openai-whisper

# 3) Transcribe (Hindi)
whisper "E:\tools\instalag-1.mp4" --model small --language hi --output_format txt

# 4) Translate to English
whisper "myfile.mp4" --model small --task translate --output_format txt

# 5) Use cached models from E: drive
python -c "import whisper; whisper.load_model('small', download_root=r'E:\tools\whisper_cache')"

Want this formatted as a printable PDF or a single-page installer-ready README? I can export this HTML to a tidy PDF or prepare the app bundling steps (PyInstaller commands) next — tell me which you prefer.

Made simple — happy transcribing 🚀

To Top