Inference model · DEEPSEEK

DeepSeek V4 Flash

A fast, cheap MoE for high-throughput pipelines. Pro tenants only.

Request DeepSeek V4 Flash access Back to catalog

DeepSeek V4 Flash is what we serve to Pro hosted-agent tenants that need to fan out across thousands of short inference calls per minute. Smaller active param count than the Qwen MoE, lower latency floor, slightly narrower knowledge.

When to pick it

Real-time chat-style agent loops with strict latency budgets
Per-message routing or classification (intent, tone, topic)
Cheap first-pass scoring before falling through to a larger model

When to look elsewhere

You’re on the hosted-agents Core tier (this model is Pro-only)
Long-context tasks > 128K
Tasks where you need the broadest knowledge — pick a 30B-class dense model

Continue in the ScaLabs Cloud Console

We'll create your account and email you a 6-digit sign-in code. Finish the request inside the console.

DeepSeek V4 Flash

When to pick it

When to look elsewhere

Get an API key for this model.

Continue in the ScaLabs Cloud Console