Inference model · DEEPSEEK
DeepSeek V4 Flash
A fast, cheap MoE for high-throughput pipelines. Pro tenants only at founding launch.
DeepSeek V4 Flash is what we serve to Pro hosted-agent tenants that need to fan out across thousands of short inference calls per minute. Smaller active param count than the Qwen MoE, lower latency floor, slightly narrower knowledge.
When to pick it
- Real-time chat-style agent loops with strict latency budgets
- Per-message routing or classification (intent, tone, topic)
- Cheap first-pass scoring before falling through to a larger model
When to look elsewhere
- You’re on the hosted-agents Core tier (this model is Pro-only at founding)
- Long-context tasks > 128K
- Tasks where you need the broadest knowledge — pick a 30B-class dense model
Request DeepSeek V4 Flash access
Get an API key for this model.
Pay-per-use, no deposit, no commitment. We'll send your API key and the OpenAI-compatible endpoint URL within one working day.
Request received. We'll follow up with founding terms.
Please complete the required fields and try again.