ConvertiloConvertilo

Video to Text Online — Transcription & Subtitles

Drop an MP4, MOV, WebM or MKV — Whisper transcribes the speech right in your browser. English, Russian, 16+ languages. No sign-up, files never leave your device.

🎬
Drop video here or click to choose
MP4, MOV, WebM, MKV, M4V, AVI — up to 500 MB

About video transcription in the browser

Video to Text is automatic transcription of speech from a video file. Drop in an MP4, MOV, WebM or MKV — the browser pulls the audio track, and Whisper (OpenAI's open-licensed model) splits it into 30-second windows, finds the speech, adds punctuation and returns text. Then copy it, download a .txt, or hit 'Translate' to open the result in our translator.

Under the hood — Whisper, converted to ONNX format and run via transformers.js (Hugging Face) as WebAssembly directly in your browser. It supports 99 languages; we expose 16 of the most common in the UI — English, Russian, German, French, Spanish, Italian, Portuguese, Ukrainian, Polish, Czech, Turkish, Dutch, Chinese, Japanese, Korean, Arabic. 'Auto' mode lets Whisper detect the language from the first seconds of audio.

The video never leaves your device — transcription runs locally, in the browser. The model is downloaded once (~75 MB for Tiny, ~145 MB for Base) and cached. No sign-up, no upload to a server. Long files (>5 min) take longer to transcribe — that's the price of staying private.

Where it's useful

Transcribe a YouTube interview

Downloaded a video interview or podcast — drop in the file and get text ready for editing, quotes or an article. Whisper handles punctuation well and works with typical conversational speech.

Make webinar minutes

Recorded a Zoom call or webinar — turn the video into text to quickly find who said what and send a summary to the team. Accuracy is best with a clean recording without heavy background noise.

Pull subtitles from a video

Filming for YouTube or TikTok — transcribe the speech to add subtitles or quickly write a description. The text can be edited right on the page before downloading.

Transcribe a foreign-language tutorial

Lecture in English, tutorial in German, film in Spanish — transcribe the speech first, then click '→ Translate' to open the text in our translator (it also runs locally).

FAQ

Which video formats are supported?

Anything the browser can decode: MP4 (H.264 + AAC — most common), MOV (from iPhone), WebM, MKV, M4V, AVI, MPEG. The video itself isn't needed — we only pull the audio track. If your file won't open, try converting it to MP4 first, e.g. with our video converter.

Is the video uploaded to a server?

No. Transcription is fully client-side — the Whisper model is downloaded once to your device and runs locally via WebAssembly. The video file itself is never uploaded. You can disconnect from the internet after the model loads — transcription will still work.

How accurate is it?

For clean speech — typically 90–96% (Tiny model) or 94–98% (Base model). Accuracy drops on background music, multiple simultaneous speakers, strong accents and specialised jargon. If the video has loud music or effects, consider extracting just the speech first via our 'Audio from video' tool + editor.

What's the maximum video size?

Up to 500 MB. That covers most 720p videos up to 1–2 hours, or 1080p up to 30–60 minutes. If your file is larger, trim it in any video editor or re-encode to a lower bitrate first. Video quality doesn't matter for transcription — only the audio does.

How long does transcribing an hour-long video take?

Tiny on a CPU usually runs close to real time — 1 hour of video ≈ 1 hour to transcribe. Base is 1.5–2× slower but more accurate. The first run is slower because the model has to download (~75 MB for Tiny, ~145 MB for Base). After that, the model is cached in the browser and skipped on subsequent files.

How is this different from YouTube auto-captions?

YouTube only transcribes public videos, requires upload to a server, and sometimes produces poor auto-captions. Here, it's private, no account, you can transcribe any work, personal or confidential video. The text is editable in the browser and downloads instantly.

Can I translate the transcript right away?

Yes — after transcription, click '→ Translate'. Our text translator opens with the text pre-filled. Translation also runs locally (via Mozilla's Bergamot WASM), nothing is uploaded.

Try Also