Document vision · OCR

GLM-OCR

Document vision for receipts, PDFs, screenshots, forms.

GLM-OCR is the document-vision endpoint built on the GLM family of open vision-language models from Zhipu / Tsinghua. We host it as a clean OCR API that reads receipts, PDFs, screenshots, forms, tables, and handwriting and returns structured JSON or markdown — designed to be called as an agent tool, not a standalone product.

What it returns

Raw text for simple reads
Markdown with preserved heading/list/table structure
JSON with a tenant-supplied schema, suitable for direct insertion into a downstream typed pipeline
Bounding boxes for every extracted field if you need to draw or audit

When to pick it

Agents that need to read an image — receipt → expense entry, PDF → knowledge-graph node, screenshot → structured action
Replacing expensive per-token vision-LLM calls when the task is “extract fields”, not “reason over the image”
Cheap first-pass OCR before falling through to a vision-LLM for ambiguous cases

Pricing

EUR 0.0015 per page flat — matches Google Vision and AWS Textract basic OCR at the entry rate, well under any LLM-vision call.
EUR 0.0008 per page above 500,000 pages per month — applied automatically once a tenant crosses the threshold within a billing month.
No per-token tail. Pages are detected automatically for PDFs; a single image input counts as one page.

Limits

Per-tenant rate limit: 60 pages per second
Image size limit: 20 MB per page
Supported formats: png, jpg, webp, pdf, tiff

Best for

Receipt and invoice extraction for finance agents
PDF and screenshot reading inside agent tool loops
Form / table / handwriting digitization
Pre-processing image-heavy inputs for downstream LLM reasoning

Upstream source: github.com/THUDM/GLM

Continue in the ScaLabs Cloud Console

We'll create your account and email you a 6-digit sign-in code. Finish the request inside the console.