Inference model · QWEN

Qwen 3.6 35B A3B

MoE economics at small-active-parameter cost. Best tokens-per-EUR ratio in the catalog.

A 35B-parameter mixture-of-experts model with only 3B active per token. You pay for small-model serving cost; you get large-model knowledge surface. Great fit for agent workflows that do many short calls.

When to pick it

  • High-volume agent pipelines (background research, daily digests, ingestion)
  • When you want richer knowledge than a 27B dense but tighter latency than a 70B
  • A/B testing against Qwen 3.6 27B on your specific workload — often wins on cost

When to look elsewhere

  • Strict latency-floor requirements (MoE routing adds variance)
  • Very long context → look at the 122B A10B variant (256K) or MiniMax M2.7 (1M)