Speech-to-text · STT

Whisper Large v3

Multilingual speech-to-text — 99 languages, including Nepali.

Request Whisper Large v3 access Back to catalog

Whisper Large v3 is the third-generation release of OpenAI’s open Whisper model — robust, multilingual speech recognition trained on 680k+ hours of multilingual and multitask supervised data from the web.

We host it on our GPUs with two endpoint shapes:

/v1/audio/transcriptions — OpenAI-compatible drop-in. Send a file or a URL; get word-level timestamps, language auto-detect, and an optional diarization track back.
/v1/audio/translations — translates any of 99 supported source languages directly into English.

When to pick it

Anything an agent needs to listen to: voice messages, calls, voice notes, meeting recordings, podcast clips
Multilingual workflows where you cannot pin the input language ahead of time
Pipelines where you want to chain transcription into a hosted-LLM call in the same private network round trip

Pricing

EUR 0.15 per hour of audio — about 42% of OpenAI’s public Whisper API rate ($0.36/hour) and under Deepgram Nova-3 batch ($0.26/hour). Billed in 1-second increments after the first 30 seconds; word-level timestamps, speaker diarization, and language ID are included, not per-feature add-ons.

Limits

Per-tenant rate limit: 60 minutes of audio per minute (60× real-time)
File size limit: 1 GB per request
Supported formats: mp3, wav, flac, m4a, ogg, opus, webm

Best for

Voice-note triage for messaging-bot agents
Call transcription and meeting recap pipelines
Multilingual content moderation
Subtitling, dubbing prep, and translation pipelines

Upstream source: github.com/openai/whisper

Continue in the ScaLabs Cloud Console

We'll create your account and email you a 6-digit sign-in code. Finish the request inside the console.