Private autonomous agents. Fixed yearly price. Clean hydropower. Not in China.
Host the whole agent, not just the LLM call.
ScaLabs Cloud runs OpenClaw and Hermes-style agents in Kathmandu, beside the same inference stack they call. Your plan includes the runtime, private state, model usage, Whisper STT, GLM-OCR document vision, and Chatterbox + Qwen3-TTS voice output (with voice cloning) for normal agent workloads.
Why this exists
Agents should not need five providers just to stay alive.
One managed stack
Tenant runtime, private memory, gateway configuration, model calls, speech, OCR, and voice output belong in one plan.
Built for always-on loops
Run normal agent workloads without watching every token. Fair-use allowances stop resale, spam, and runaway loops, not regular work.
Your state stays yours
Memory, files, credentials, conversations, and tool state stay inside your tenant environment and are never used for model training.
Open runtimes, managed ops
Bring OpenClaw, Hermes Agent, or a reviewed compatible runtime. We operate the tenant, gateways, quotas, logs, secrets, and restarts.
Supported runtimes
Bring your agent framework. We host it.
ScaLabs Cloud is runtime-agnostic — we don't force a custom framework on you. Two open-source agent runtimes are first-class at founding launch; others get reviewed on request.
OpenClaw
Open-source personal AI assistant framework — runs on any OS, any platform, with native gateway integrations for Telegram, Slack, Discord, WhatsApp, and more. The lobster way.
github.com/openclaw/openclaw →
Hermes Agent by Nous Research
The self-improving agent — persistent memory, scheduled automations, browser control, MCP/tool use, kanban-style planner. The agent that grows with you.
github.com/NousResearch/hermes-agent →Using Claude Managed Agents instead? Anthropic keeps the agent loop on their side; we can host the sandbox half on our Kathmandu workers at 20–50 % below the Cloudflare / Daytona / Modal / Vercel rates Anthropic named as launch partners. See /sandboxes/claude-self-hosted for the breakdown.
Also reviewing on a case-by-case basis: OpenHands, AutoGen-compatible runtimes, and customer-provided container images. Talk to us if your framework isn't on this list.
Included utility rails
Speech, OCR, and voice output are not add-ons.
Useful agents listen, read, reason, and respond. For normal, non-spam agent use, those glue calls are effectively unlimited inside the plan instead of metered through separate speech, OCR, and TTS vendors.
Whisper-class speech-to-text
Let agents handle voice notes, calls, meetings, and audio snippets without adding a separate speech API bill.
GLM-class document vision
Let agents read screenshots, receipts, PDFs, forms, and image-heavy inputs inside the same hosted workflow.
Chatterbox + Qwen3-TTS voice output
Natural voice output plus zero-shot voice cloning. Built on Chatterbox (MIT) and Qwen3-TTS (Apache 2.0) — the same engines we sell standalone on /utilities.
OpenAI-compatible local endpoint
The agent runtime calls ScaLabs Cloud inference over the private network, with quota enforcement and model-specific fair-use allowances.
Included still has guardrails.
Normal voice, OCR, and TTS workloads are part of the plan. Spam, resale, credential abuse, quota bypassing, and runaway loops are not; rate limits and abuse controls keep the cohort usable.
Founding tiers
One private agent stack. Two levels of headroom.
Core is for personal and small-team agents. Pro is for heavier loops, larger models, and higher gateway and network egress limits.
EUR 588/year at launch. EUR 19 refundable founding deposit credited against the first invoice.
- 8 GB tenant RAM quota
- Fair-share 4 vCPU-class runtime capacity
- Model-specific fair-use inference, up to 320M tokens/month on the highest-allowance model
- OpenClaw / Hermes-compatible runtime with private tenant volume
- Included normal-use Whisper STT, GLM OCR, and Qwen/Kokoro/Piper-class TTS
EUR 948/year at launch. EUR 29 refundable founding deposit credited against the first invoice.
- 12 GB tenant RAM quota
- Fair-share 6 vCPU-class runtime capacity
- Model-specific fair-use inference, up to 510M tokens/month on the highest-allowance model
- Included normal-use Whisper STT, GLM OCR, and Qwen/Kokoro/Piper-class TTS
- Larger-model access, confirmed after benchmarks and license review
- Higher gateway and network egress limits
CPU is fair-share, not hard dedicated vCores. RAM quota, annual price, refund terms, and the no-training commitment are the founding commitments; exact storage, gateway, utility, and egress limits are finalized in launch terms after runtime benchmarks.
Usage promise
Use the default agent models without metering anxiety. Heavier models get honest limits.
The 320M / 510M headline comes from the fastest model class. Larger models are available with lower monthly allowances so the plan stays predictable for everyone.
| Model | Core allowance / mo | Pro allowance / mo | Notes |
|---|---|---|---|
| HimalayaGPT 0.5B | Unlimited fair-use | Unlimited fair-use | Free for everyone — Nepali sovereign LLM |
| Qwen 3.6 27B | 159M tokens | 254M tokens | Dense agent and coding model |
| Qwen 3.6 35B A3B | 272M tokens | 435M tokens | Small-active MoE sweet spot |
| Gemma 4 31B | 136M tokens | 218M tokens | Dense model, lower throughput |
| Gemma 4 26B A4B | 318M tokens | 508M tokens | Highest headline allowance |
| DeepSeek V4 Flash | Pro only | 109M tokens | Pro-only model, confirmed after benchmarks |
| Cohere Command A+ | 75M tokens | 140M tokens | Open-weight enterprise flagship — multilingual, agentic |
| Qwen 3.5 122B A10B | 102M tokens | 163M tokens | Larger MoE workflow option |
| MiniMax M2.7 | 40M tokens | 127M tokens | Large-agent MoE, pending license review |
Efficiency gains should flow back to customers.
If ScaLabs Cloud's runtime improves or model-serving costs fall, founding customers benefit through higher practical allowances, better model availability, or both. The founding price stays fixed for 3 years while yearly renewals stay current. The table above is confirmed at launch after B60 Dual benchmarks and runtime soak tests.
Agent safety
An agent with tools is infrastructure. We treat it like one.
A useful agent can touch tools, credentials, and outside systems. Isolation, tool grants, secrets, logs, restart policy, quota enforcement, and customer-controlled deletion are part of the product.
Join Founding Batch 1
Request a founding place before the batch closes.
This form is not a payment step. Tell us what you want to run; if it fits the cohort, we send the deposit, refund, saved-card, and annual billing terms for review before any payment authorization.
- Core: EUR 19 refundable deposit; first annual charge is EUR 588 less deposit credit at launch.
- Pro: EUR 29 refundable deposit; first annual charge is EUR 948 less deposit credit at launch.
- The founding price is guaranteed for 3 years if yearly renewals stay current.
- Plans are paid yearly and can be canceled at renewal boundaries.
- Your card is charged for the annual plan only after launch notice and authorization.
- If ScaLabs Cloud misses the stated launch window or cancels the cohort, the deposit is refundable.
Practical questions
The details that matter before you reserve.
Does this mean unlimited inference?
It means normal autonomous-agent workflows are covered by the plan instead of billed per token. It does not mean unrestricted resale, spam, or runaway loops. Every tier has model-specific fair-use allowances, rate limits, anti-resale terms, and abuse controls.
Are STT, OCR, and TTS really included?
Yes. Whisper-class speech-to-text, GLM-class OCR, and Qwen/Kokoro/Piper-class text-to-speech are included for normal, non-spam agent workflows. They remain subject to rate limits, anti-resale terms, and abuse throttles.
Why run the agent in Nepal if I am in Europe or the US?
This is built for agents that run in the background through Telegram, Slack, Discord, WhatsApp, email, web UI, or scheduled tasks. For those workflows, the important round trip is often inside the agent loop: state, tool decisions, and model calls running close together — and our Kathmandu network keeps that path short and cheap.
When do I pay?
The form is free. If your use case fits the cohort, you can pay a small refundable deposit after reviewing the terms. The annual subscription is charged only at launch after advance notice and saved-card authorization for the selected plan.
What happens if launch slips?
The deposit is refundable if ScaLabs Cloud misses the stated launch window or cancels the cohort. The deposit is demand validation, not a security, loan, investment, or equity entitlement.
Can I bring my existing agent framework?
The first cohort is designed for OpenClaw and Hermes Agent patterns: messaging-first agents, headless server workflows, MCP/tool use, scheduled tasks, and gateway-based operation. Customer-provided runtimes are reviewed after isolation, support, and billing are stable.
What happens to memory, files, logs, and tool data?
They stay inside the tenant environment. ScaLabs Cloud does not use customer prompts, outputs, memory, files, conversations, or tool-call data for model training. Tool calls and admin actions can be logged as metadata; request and response bodies are not logged by default.
What is still launch-gated?
We do not promise hard dedicated vCores, exact public latency numbers, or unbenchmarked throughput. The launch catalog and allowances are confirmed after B60 Dual benchmarks and runtime soak tests.