Inference model · DEEPSEEK

DeepSeek V4 Flash

A fast, cheap MoE for high-throughput pipelines. Pro tenants only at founding launch.

DeepSeek V4 Flash is what we serve to Pro hosted-agent tenants that need to fan out across thousands of short inference calls per minute. Smaller active param count than the Qwen MoE, lower latency floor, slightly narrower knowledge.

When to pick it

  • Real-time chat-style agent loops with strict latency budgets
  • Per-message routing or classification (intent, tone, topic)
  • Cheap first-pass scoring before falling through to a larger model

When to look elsewhere

  • You’re on the hosted-agents Core tier (this model is Pro-only at founding)
  • Long-context tasks > 128K
  • Tasks where you need the broadest knowledge — pick a 30B-class dense model