Private autonomous agents. Fixed yearly price. Clean hydropower. Not in China.

Host the whole agent, not just the LLM call.

ScaLabs Cloud runs OpenClaw and Hermes-style agents in Kathmandu, beside the same inference stack they call. Your plan includes the runtime, private state, model usage, Whisper STT, GLM-OCR document vision, and Chatterbox + Qwen3-TTS voice output (with voice cloning) for normal agent workloads.

STT / OCR / TTS included Private memory, files, and tools 3-year price guarantee No training on customer data
managed-agent.stack private Kathmandu network
Your agent runtime OpenClaw / Hermes + gateways
Included utility rails inference + speech + OCR + voice
Private ScaLabs endpoint OpenAI-compatible API on local network

Why this exists

Agents should not need five providers just to stay alive.

01

One managed stack

Tenant runtime, private memory, gateway configuration, model calls, speech, OCR, and voice output belong in one plan.

02

Built for always-on loops

Run normal agent workloads without watching every token. Fair-use allowances stop resale, spam, and runaway loops, not regular work.

03

Your state stays yours

Memory, files, credentials, conversations, and tool state stay inside your tenant environment and are never used for model training.

04

Open runtimes, managed ops

Bring OpenClaw, Hermes Agent, or a reviewed compatible runtime. We operate the tenant, gateways, quotas, logs, secrets, and restarts.

Supported runtimes

Bring your agent framework. We host it.

ScaLabs Cloud is runtime-agnostic — we don't force a custom framework on you. Two open-source agent runtimes are first-class at founding launch; others get reviewed on request.

Using Claude Managed Agents instead? Anthropic keeps the agent loop on their side; we can host the sandbox half on our Kathmandu workers at 20–50 % below the Cloudflare / Daytona / Modal / Vercel rates Anthropic named as launch partners. See /sandboxes/claude-self-hosted for the breakdown.

Also reviewing on a case-by-case basis: OpenHands, AutoGen-compatible runtimes, and customer-provided container images. Talk to us if your framework isn't on this list.

Included utility rails

Speech, OCR, and voice output are not add-ons.

Useful agents listen, read, reason, and respond. For normal, non-spam agent use, those glue calls are effectively unlimited inside the plan instead of metered through separate speech, OCR, and TTS vendors.

STT

Whisper-class speech-to-text

Let agents handle voice notes, calls, meetings, and audio snippets without adding a separate speech API bill.

OCR

GLM-class document vision

Let agents read screenshots, receipts, PDFs, forms, and image-heavy inputs inside the same hosted workflow.

TTS

Chatterbox + Qwen3-TTS voice output

Natural voice output plus zero-shot voice cloning. Built on Chatterbox (MIT) and Qwen3-TTS (Apache 2.0) — the same engines we sell standalone on /utilities.

API

OpenAI-compatible local endpoint

The agent runtime calls ScaLabs Cloud inference over the private network, with quota enforcement and model-specific fair-use allowances.

Included still has guardrails.

Normal voice, OCR, and TTS workloads are part of the plan. Spam, resale, credential abuse, quota bypassing, and runaway loops are not; rate limits and abuse controls keep the cohort usable.

Founding tiers

One private agent stack. Two levels of headroom.

Core is for personal and small-team agents. Pro is for heavier loops, larger models, and higher gateway and network egress limits.

Agent Core EUR 49/mo

EUR 588/year at launch. EUR 19 refundable founding deposit credited against the first invoice.

  • 8 GB tenant RAM quota
  • Fair-share 4 vCPU-class runtime capacity
  • Model-specific fair-use inference, up to 320M tokens/month on the highest-allowance model
  • OpenClaw / Hermes-compatible runtime with private tenant volume
  • Included normal-use Whisper STT, GLM OCR, and Qwen/Kokoro/Piper-class TTS
Request Core

CPU is fair-share, not hard dedicated vCores. RAM quota, annual price, refund terms, and the no-training commitment are the founding commitments; exact storage, gateway, utility, and egress limits are finalized in launch terms after runtime benchmarks.

Usage promise

Use the default agent models without metering anxiety. Heavier models get honest limits.

The 320M / 510M headline comes from the fastest model class. Larger models are available with lower monthly allowances so the plan stays predictable for everyone.

ModelCore allowance / moPro allowance / moNotes
HimalayaGPT 0.5BUnlimited fair-useUnlimited fair-useFree for everyone — Nepali sovereign LLM
Qwen 3.6 27B159M tokens254M tokensDense agent and coding model
Qwen 3.6 35B A3B272M tokens435M tokensSmall-active MoE sweet spot
Gemma 4 31B136M tokens218M tokensDense model, lower throughput
Gemma 4 26B A4B318M tokens508M tokensHighest headline allowance
DeepSeek V4 FlashPro only109M tokensPro-only model, confirmed after benchmarks
Cohere Command A+75M tokens140M tokensOpen-weight enterprise flagship — multilingual, agentic
Qwen 3.5 122B A10B102M tokens163M tokensLarger MoE workflow option
MiniMax M2.740M tokens127M tokensLarge-agent MoE, pending license review

Efficiency gains should flow back to customers.

If ScaLabs Cloud's runtime improves or model-serving costs fall, founding customers benefit through higher practical allowances, better model availability, or both. The founding price stays fixed for 3 years while yearly renewals stay current. The table above is confirmed at launch after B60 Dual benchmarks and runtime soak tests.

Agent safety

An agent with tools is infrastructure. We treat it like one.

A useful agent can touch tools, credentials, and outside systems. Isolation, tool grants, secrets, logs, restart policy, quota enforcement, and customer-controlled deletion are part of the product.

Per-tenant isolation Separate containers, volumes, memory stores, and quota enforcement.
Least-privilege tools Gateways, MCP servers, shell, browser, and webhooks stay off until explicitly enabled.
Secrets handling Customer credentials are injected at runtime and never written into logs.
Dashboard and audit trail Status, usage, gateway config, secrets, restarts, quota events, and metadata logs are visible per tenant.
No cross-tenant learning Agent memory and skill loops stay tenant-scoped. No shared memory store seeded from customer data.
Runaway-loop throttles Agent loops, STT, OCR, TTS, and egress remain governed by rate limits and abuse controls.

Practical questions

The details that matter before you reserve.

Does this mean unlimited inference?

It means normal autonomous-agent workflows are covered by the plan instead of billed per token. It does not mean unrestricted resale, spam, or runaway loops. Every tier has model-specific fair-use allowances, rate limits, anti-resale terms, and abuse controls.

Are STT, OCR, and TTS really included?

Yes. Whisper-class speech-to-text, GLM-class OCR, and Qwen/Kokoro/Piper-class text-to-speech are included for normal, non-spam agent workflows. They remain subject to rate limits, anti-resale terms, and abuse throttles.

Why run the agent in Nepal if I am in Europe or the US?

This is built for agents that run in the background through Telegram, Slack, Discord, WhatsApp, email, web UI, or scheduled tasks. For those workflows, the important round trip is often inside the agent loop: state, tool decisions, and model calls running close together — and our Kathmandu network keeps that path short and cheap.

When do I pay?

The form is free. If your use case fits the cohort, you can pay a small refundable deposit after reviewing the terms. The annual subscription is charged only at launch after advance notice and saved-card authorization for the selected plan.

What happens if launch slips?

The deposit is refundable if ScaLabs Cloud misses the stated launch window or cancels the cohort. The deposit is demand validation, not a security, loan, investment, or equity entitlement.

Can I bring my existing agent framework?

The first cohort is designed for OpenClaw and Hermes Agent patterns: messaging-first agents, headless server workflows, MCP/tool use, scheduled tasks, and gateway-based operation. Customer-provided runtimes are reviewed after isolation, support, and billing are stable.

What happens to memory, files, logs, and tool data?

They stay inside the tenant environment. ScaLabs Cloud does not use customer prompts, outputs, memory, files, conversations, or tool-call data for model training. Tool calls and admin actions can be logged as metadata; request and response bodies are not logged by default.

What is still launch-gated?

We do not promise hard dedicated vCores, exact public latency numbers, or unbenchmarked throughput. The launch catalog and allowances are confirmed after B60 Dual benchmarks and runtime soak tests.