Inference model · QWEN
Qwen 3.6 35B A3B
MoE economics at small-active-parameter cost. Best tokens-per-EUR ratio in the catalog.
A 35B-parameter mixture-of-experts model with only 3B active per token. You pay for small-model serving cost; you get large-model knowledge surface. Great fit for agent workflows that do many short calls.
When to pick it
- High-volume agent pipelines (background research, daily digests, ingestion)
- When you want richer knowledge than a 27B dense but tighter latency than a 70B
- A/B testing against Qwen 3.6 27B on your specific workload — often wins on cost
When to look elsewhere
- Strict latency-floor requirements (MoE routing adds variance)
- Very long context → look at the 122B A10B variant (256K) or MiniMax M2.7 (1M)
Request Qwen 3.6 35B A3B access
Get an API key for this model.
Pay-per-use, no deposit, no commitment. We'll send your API key and the OpenAI-compatible endpoint URL within one working day.
Request received. We'll follow up with founding terms.
Please complete the required fields and try again.