Speech-to-text · STT
Qwen3-ASR
Apache 2.0 STT with 52-language coverage — the wider-language alternative to Whisper.
Qwen3-ASR is Alibaba’s open-weight automatic speech recognition model from the Qwen family — 1.7B parameters, Apache 2.0, covering 52 languages. We host it alongside Whisper Large v3 as the cost-sensitive, attribution-free alternative.
When to pick Qwen3-ASR over Whisper
- License hygiene matters. Whisper is MIT — close to Apache 2.0 — but some procurement teams prefer Apache 2.0 with its explicit patent grant and no-attribution requirement. Qwen3-ASR is the Apache option.
- Cost-sensitive batch transcription. EUR 0.07/hour vs Whisper’s EUR 0.10/hour. Over millions of hours, the difference adds up.
- Latency budget is tight. 1.7B vs 1.55B is similar; the architectural choices in Qwen3-ASR give it a small first-byte-latency edge in streaming mode.
- Your workload is heavy in Asian languages. Qwen3-ASR’s training set has stronger coverage on Chinese, Japanese, Korean, Hindi, and other South / East Asian languages than Whisper.
When to pick Whisper instead
- Maximum language coverage. Whisper supports 99 languages, Qwen3-ASR 52. If you need the long tail of low-resource languages, Whisper wins.
- Mature community tooling. Whisper’s ecosystem (timestamps, diarization, translation-to-English, fine-tuning recipes) is more mature.
- Reference benchmark numbers. Most academic and product benchmarks reference Whisper; staying on the same model simplifies comparison.
Pricing
EUR 0.07 per hour of audio. Billed in 1-second increments after the first 30 seconds; the first 60 minutes per tenant per month are free.
Limits
- Per-tenant rate limit: 60 minutes of audio per minute (60× real-time)
- File size limit: 1 GB per request
- Supported formats: mp3, wav, flac, m4a, ogg, opus, webm
Why we picked it as the second STT
We surveyed the field — NVIDIA Canary-Qwen 2.5B has slightly higher accuracy but uses CC-BY-4.0 (attribution required per-use). IBM Granite Speech 3.3 is Apache 2.0 but only covers 8 languages well. NVIDIA Parakeet TDT is the throughput champion but again CC-BY-4.0. Qwen3-ASR’s combination of Apache 2.0 + 52 languages + sub-Whisper pricing made it the clean choice for our second STT slot.
Best for
- Multilingual transcription where Apache 2.0 attribution-free licensing matters
- Cost-sensitive batch transcription where Whisper is the more expensive choice
- Workloads where a 1.7B model fits the latency budget Whisper Large v3 doesn't
- South / East Asian languages where Qwen's training set has strong coverage
Upstream source: github.com/QwenLM/Qwen3-ASR
Request Qwen3-ASR access
Get an API key.
Straight pay-per-use against the published rate. No deposit, no minimums. Tell us what you're building and we'll send your API key and endpoint URL within one working day.