AI Video Subtitler
Runs in your browserAuto-generate accurate subtitles with Whisper AI. Edit inline, export SRT/VTT, or burn captions into your video. 99 languages. 100% browser-based — your video never leaves your device.
How It Works
Upload Video
Drop an MP4, MOV, WebM, or audio file. Everything stays local — nothing uploads.
Transcribe with AI
Whisper AI (OpenAI) runs in your browser. Pick a language, click generate.
Edit & Export
Fix any mistake inline, export SRT/VTT for editors, or burn captions into the video.
Frequently Asked Questions
Does my video get uploaded to a server?
No. Everything runs in your browser. Audio decoding, Whisper AI transcription, caption burn-in — all happen on your device. Nothing leaves your machine.
Which Whisper model should I use?
Whisper Tiny (~40MB) is fast and accurate enough for clear audio. Whisper Base (~150MB) is slower but better for accents, noisy audio, or multiple speakers. Both cache after the first load — you only download them once.
What languages are supported?
Whisper supports 99 languages, including English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Korean, Chinese, Arabic, Hindi, and more. Pick the primary spoken language for best accuracy.
What's the difference between SRT and VTT?
SRT is the universal subtitle format — use it for Premiere, Final Cut, DaVinci Resolve, YouTube, and most video editors. VTT (WebVTT) is for web players and HTML5 video. Both are plain text and easy to re-edit.
What does 'burn-in' mean?
Burn-in hardcodes the captions directly into the video pixels — great for TikTok, Instagram Reels, and YouTube Shorts, where platforms often ignore uploaded SRT files. Pick a position (top/middle/bottom) and a style (classic box, TikTok outline, minimal shadow), and we render a new MP4 or WebM with the text baked in.
Why are some timestamps slightly off?
Whisper estimates timestamps from audio chunks and can be ±1-2 seconds off on long clips. Click any timestamp to preview, then click the text to edit. For editors, you can also adjust timings in the downloaded SRT directly.
Is there a file size or length limit?
No hard limit — but longer videos take longer to transcribe (Whisper Tiny runs at roughly 2-5× real-time on modern laptops). For files over 30 minutes, use Whisper Tiny for speed. Memory scales with audio length since the full waveform is held in RAM.
Does it work offline?
Once the Whisper model is cached (after first use), yes — transcription and burn-in run fully offline. You'd need internet only to reload the page.
Built With Open Source
State-of-the-art speech recognition model with 99-language support
Run Hugging Face models in the browser via ONNX Runtime Web
Decode and resample video audio to 16kHz mono PCM for Whisper
Burn captions into video frames and encode as MP4 or WebM