Convert Video to Text Using Whisper | Full Setup & Step-by-Step Guide

A friendly, step-by-step tutorial for Windows (works on macOS/Linux too) — how to install Whisper, ffmpeg, Python & PyTorch, run in terminal, and use it offline.

What is Whisper?

Whisper is an automatic speech recognition (ASR) and speech-translation model from OpenAI. It can:

Transcribe audio/video to text (many languages).
Translate speech into English (optional).
Run locally once the model is downloaded — great for privacy.

Whisper comes in multiple sizes (tiny, base, small, medium, large[–v2/v3]). Smaller = faster, larger = more accurate.

System requirements (basic)

Component	Recommendation
CPU	Any modern multi-core CPU (i5/i7/i9 or Ryzen equivalent). Small models run fine on CPU.
GPU	NVIDIA CUDA GPU recommended for medium/large models. RTX 30/40/50 series works well.
RAM	8 GB minimum; 16+ GB recommended for medium/large models.
Disk space	Model files can be 100MB → several GB depending on model. Store models on a drive with free space (e.g., E:).
OS	Windows 10/11, macOS, Linux — Python & ffmpeg must be installed.

Install — step by step (Windows)

1. Install Python

Download the installer and choose "Add Python to PATH" on the first screen.

Official download: python.org/downloads

2. Install FFmpeg

Download a prebuilt Windows build, extract it and add .../ffmpeg/bin to your PATH.

FFmpeg builds and instructions: ffmpeg.org/download.html

3. Install Whisper (Python package)

pip install -U openai-whisper

If that fails, install directly from GitHub:

pip install git+https://github.com/openai/whisper.git

4. Install PyTorch (CPU or GPU)

Go to the PyTorch site and pick the right command for your system.

PyTorch installer selector: pytorch.org/get-started/locally

Example (CUDA 12.1 + pip):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

If you don't have an NVIDIA GPU or prefer CPU only:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

5. Verify

python --version
pip --version
ffmpeg -version
whisper --help

If all commands run, you're ready.

Whisper terminal commands — explained

Basic transcription (force language Hindi):

whisper "E:\tools\instalag-1.mp4" --model small --language hi --output_format txt

Translate audio to English:

whisper "myvideo.mp4" --model small --task translate --output_format txt

Use GPU:

whisper "myvideo.mp4" --model medium --device cuda

Save subtitles (.srt):

whisper "myvideo.mp4" --model small --output_format srt --language hi

Common CLI options:

--model : tiny | base | small | medium | large
--language : language code e.g. hi (Hindi), en (English) or auto
--task : transcribe or translate
--device : cpu or cuda
--output_format : txt, srt, vtt, json
--output_dir : folder to save output files

When you run a model the first time, Whisper downloads the model file and caches it locally — after that it runs offline.

Offline models (store them on another drive)

If your C: drive is full, you can store model files anywhere (e.g., E:\tools\whisper_cache). Put the model .pt files in that folder and tell Whisper to use it.

Download links (model files / sources)

Whisper repo (package & docs): github.com/openai/whisper
Whisper model files (Hugging Face/OpenAI releases) — example model pages: search "openai whisper small pt" on Hugging Face or use the repo links above.
Standalone Windows GUI/executables (community projects): whisper-standalone-win (example)

How to set the local cache folder in code

import whisper
model = whisper.load_model("small", download_root=r"E:\tools\whisper_cache")

Manual file layout (recommended)

E:\tools\Transcriber\
  ├─ ffmpeg\bin\ffmpeg.exe
  ├─ whisper_cache\    <-- PUT models here (base.pt, small.pt, ...)
  └─ app.py            <-- Your code

If offline, ensure the correct model filename is present (e.g., small.pt). Whisper will load from that folder and not try to download.

Extras, tips & troubleshooting

Common issues

pip not found: Add Python to PATH during install or add Python install directory to PATH manually.
ffmpeg not found: Ensure .../ffmpeg/bin is in PATH or use full path in scripts.
PyTorch CUDA problems: Install the correct CUDA wheel matching your GPU and driver.
Model download fails: Download manually on a networked machine and copy to your cache folder.

Batch convert all MP4s (PowerShell one-liner)

Get-ChildItem -Path "E:\tools" -Filter *.mp4 | ForEach-Object {
  whisper $_.FullName --model small --language hi
}

Chunk long files (recommended for long recordings)

Split audio into chunks (e.g., 30s) and transcribe piecewise — reduces memory pressure and improves reliability for long sessions.

Useful downloads & pages

Quick start cheat sheet

# 1) Verify tools
python --version
ffmpeg -version

# 2) Install whisper
pip install -U openai-whisper

# 3) Transcribe (Hindi)
whisper "E:\tools\instalag-1.mp4" --model small --language hi --output_format txt

# 4) Translate to English
whisper "myfile.mp4" --model small --task translate --output_format txt

# 5) Use cached models from E: drive
python -c "import whisper; whisper.load_model('small', download_root=r'E:\tools\whisper_cache')"

Convert Video to Text Using Whisper | Full Setup & Step-by-Step Guide

What is Whisper?

System requirements (basic)

Install — step by step (Windows)

1. Install Python

2. Install FFmpeg

3. Install Whisper (Python package)

4. Install PyTorch (CPU or GPU)

5. Verify

Whisper terminal commands — explained

Offline models (store them on another drive)

Download links (model files / sources)

How to set the local cache folder in code

Manual file layout (recommended)

Extras, tips & troubleshooting

Common issues

Batch convert all MP4s (PowerShell one-liner)

Chunk long files (recommended for long recordings)

Useful downloads & pages

Quick start cheat sheet

Made with Love by

Category

Contact form

Convert Video to Text Using Whisper | Full Setup & Step-by-Step Guide

What is Whisper?

System requirements (basic)

Install — step by step (Windows)

1. Install Python

2. Install FFmpeg

3. Install Whisper (Python package)

4. Install PyTorch (CPU or GPU)

5. Verify

Whisper terminal commands — explained

Offline models (store them on another drive)

Download links (model files / sources)

How to set the local cache folder in code

Manual file layout (recommended)

Extras, tips & troubleshooting

Common issues

Batch convert all MP4s (PowerShell one-liner)

Chunk long files (recommended for long recordings)

Useful downloads & pages

Quick start cheat sheet

You may like these posts