Skip to content
Video Tools

Subtitle Generator

Auto-generate timed subtitles from any audio or video file using AI. Supports 99 languages and downloads as SRT or VTT. Signup is not required to start.

Free in browser No sign-up required Files stay on your device
>

Drop audio or video file here

MP3, MP4, WAV, WebM, OGG, M4A

Files processed locally — never uploaded
How it works

Run this tool in three short steps.

01

Upload audio or video

Drop a file or click to browse. Audio is extracted automatically from video files.

02

Whisper transcribes locally

The Whisper AI model runs in your browser. Source audio is not uploaded to our servers during transcription.

03

Download SRT or VTT

Review and edit the transcript, then download in SRT or VTT format.

Questions

What people ask before they use this tool.

How does the subtitle generator work?
We use OpenAI Whisper (Base model) compiled to WebAssembly and running in your browser. Your audio is processed locally during transcription and is not uploaded to our servers.
What audio/video formats are supported?
MP3, MP4, WAV, WebM, OGG, M4A, FLAC, and most common audio/video formats. The tool extracts the audio track automatically.
How many languages does it support?
Whisper supports 99 languages including English, Spanish, French, German, Chinese, Japanese, Korean, Arabic, Hindi, and more. Select your language or use auto-detect.
What subtitle formats can I download?
SRT (most common, works everywhere) and VTT (WebVTT, for web video players). Both include timestamps and segmented text.
How accurate are the subtitles?
Whisper Base provides good accuracy for clear speech in supported languages. Results are usually stronger with audio that has minimal background noise. Professional-grade accuracy requires the Large model (not available in-browser).
Why does the first transcription take longer?
The Whisper Base model (~57MB) downloads on first use. After that it is cached in your browser. Subsequent transcriptions start immediately.
Is my audio uploaded to a server?
No. Whisper runs in your browser via WebAssembly. Your audio is processed locally during transcription and is not uploaded to our servers.
How long does transcription take?
Roughly 1-3x real-time on modern devices. A 5-minute clip takes 5-15 minutes. Desktop browsers are significantly faster than mobile.
Can I edit the subtitles before downloading?
Yes. The transcribed text appears in an editable area. Fix any errors, adjust timing, then download.
Does it work on mobile?
Yes, but transcription is CPU-intensive. Short clips (under 2 minutes) work well on phones. For longer audio, use a desktop browser.
What is the file size limit?
Depends on your device memory. Audio is processed in chunks. Most devices handle files up to 100MB.
Can I transcribe a YouTube video?
Not directly. Download the video first, then upload the file. Or use our <a href="/youtube-summarizer">YouTube Summarizer</a> for text summaries.
How does this compare to Otter.ai or Rev?
Otter and Rev use cloud-based models and may reach higher accuracy in some workflows. Our difference is the browser-local approach: transcription runs without uploading your source audio to our servers. Accuracy is solid for clear speech but not broadcast-grade.
Is it really free?
Yes. The tool is free to use in the browser and does not require signup to start.
Can I pay with cryptocurrency?
Video tools are free. For AI writing tools, we accept USDT, USDC, BTC, ETH. Plans start at $9.99/month.
Related

Continue the workflow

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

Coda One's Subtitle Generator uses OpenAI Whisper (Base model) compiled to WebAssembly and running in your browser. Transcribe audio and video into timed subtitles in 99 languages. Download as SRT or VTT. Source audio stays in your browser during transcription and is not uploaded to our servers.