Audio to Text Online — Free Transcription

Transcribe MP3, WAV, M4A: English, Russian, 16+ languages. Powered by Whisper running locally in your browser. Free, no sign-up.

Drag and drop your file here or click to select

Spoken language

Model

About speech recognition in the browser

Speech-to-Text is automatic transcription of audio into text. Drop in an MP3, WAV or M4A — Whisper (OpenAI's open-licensed model) splits the recording into 30-second windows, finds the speech, adds punctuation and returns text. Then copy it, download a .txt, or hit 'Translate' to open the result in our translator.

Under the hood — Whisper, converted to ONNX format and run via transformers.js (Hugging Face) as WebAssembly directly in your browser. It supports 99 languages; we expose 16 of the most common in the UI — English, Russian, German, French, Spanish, Italian, Portuguese, Ukrainian, Polish, Czech, Turkish, Dutch, Chinese, Japanese, Korean, Arabic. 'Auto' mode lets Whisper detect the language from the first seconds of audio.

Audio never leaves your device — transcription runs locally, in the browser. The model is downloaded once (~75 MB for Tiny, ~145 MB for Base) and cached. No sign-up, no limits, nothing uploaded to a server. Long files (>5 min) take longer to transcribe — that's the price of staying private.

Where it's useful

Transcribe an interview or podcast

Recorded a conversation with an expert or your own podcast — drop in the file and get text ready for editing, quotes or an article. Whisper handles punctuation well and separates lines.

Make meeting minutes

Recorded a meeting on a phone or Zoom — turn the recording into text to quickly find who said what and send out a summary. Accuracy is best with a clean recording without heavy noise.

Pull quotes from voice messages

Telegram voice notes, WhatsApp voice messages, iPhone Voice Memos — export the file, transcribe to text. Handy when you need to find what your contact said or quote them.

Transcribe foreign speech and translate it

Lecture in English, video tutorial in German, song in Spanish — transcribe first, then click 'Translate' to open the text in our translator (it also runs locally).

FAQ

Which languages are supported?

The dropdown shows 16 of the most common: English, Russian, German, French, Spanish, Italian, Portuguese, Ukrainian, Polish, Czech, Turkish, Dutch, Chinese, Japanese, Korean, Arabic. The Whisper model itself recognizes ~99 languages — 'Auto' mode picks the language from the first seconds. If your language isn't listed, choose 'Auto'.

Is the audio uploaded to a server?

No. Transcription is fully client-side — the Whisper ONNX model is downloaded once to your device (from huggingface.co) and runs locally via WebAssembly. The audio file itself is never uploaded. You can disconnect from the internet after the model loads — transcription will still work.

How accurate is it?

For clean speech — typically 90–96% (Tiny model) or 94–98% (Base model). Accuracy drops on background noise, multiple simultaneous speakers, accents, mumbled speech and specialised jargon. Tip — record close to the mic, without background music, and pick Base if accuracy matters.

Which audio formats are supported?

Anything the Web Audio API can decode: MP3, WAV, M4A (iPhone AAC), AAC, OGG Vorbis, FLAC, OPUS, WebM audio. iPhone Voice Memos (.m4a), Telegram voice notes (.ogg/.oga), Zoom recordings (.m4a), standard podcasts (.mp3) — all work.

How long does transcription take?

Depends on duration and model. Tiny on a CPU usually runs close to real time (1 minute of audio ≈ 1 minute to transcribe), Base is 1.5–2× slower but more accurate. The first run is slower because the model has to download (~75 MB for Tiny, ~145 MB for Base). After that, the model is cached in the browser.

Can I download the text with timestamps?

Not in the current version — plain text only. Whisper can return phrase- and word-level timestamps; we may add this in the future. Need .srt or .vtt subtitles? Let us know and we'll add it.

Can I translate it right away?

Yes — after transcription, click '→ Translate'. Our text translator opens with the text pre-filled. Translation also runs locally (via Mozilla's Bergamot WASM), nothing is uploaded.

Try Also

Images

Compress, convert, crop

PDF

Merge, split, convert

Text Tools

Case, transliteration, Markdown

Tools

QR, passwords, JSON, Base64

Calculators

Percent, discounts, loans