This document describes how AI agents, MCP tool servers, and other programmatic callers can integrate with the GLM-OCR Python SDK without editing YAML files or understanding the internal pipeline.
import glmocr
# One-liner — uses ZHIPU_API_KEY from environment / .env file
result = glmocr.parse("document.pdf")
print(result.to_dict())Or use the class-based API for multiple calls:
from glmocr import GlmOcr
parser = GlmOcr(api_key="sk-xxx", mode="maas")
result = parser.parse("page.png")
print(result.to_json())
parser.close() # or use: with GlmOcr(...) as parser:| Mode | Value | Requires GPU? | Description |
|---|---|---|---|
| MaaS | "maas" |
No | Forwards requests to Zhipu's cloud API. Recommended for agents. |
| Self-hosted | "selfhosted" |
Yes | Uses a local vLLM / SGLang service with optional layout detection. |
When api_key is provided without an explicit mode, the SDK automatically
defaults to MaaS mode.
The SDK resolves every setting using this priority chain (highest wins):
Constructor kwargs > os.environ > .env file > config.yaml > built-in defaults
This means an agent can override any setting without touching files.
All variables use the prefix GLMOCR_. Place them in the shell environment
or in a .env file anywhere in the working-directory ancestry.
| Variable | Maps to | Example |
|---|---|---|
GLMOCR_MODE |
pipeline.maas.enabled |
maas or selfhosted |
ZHIPU_API_KEY |
pipeline.maas.api_key |
sk-abc123 |
GLMOCR_API_URL |
pipeline.maas.api_url |
https://open.bigmodel.cn/... |
GLMOCR_MODEL |
pipeline.maas.model |
glm-ocr |
GLMOCR_TIMEOUT |
pipeline.maas.request_timeout |
600 |
GLMOCR_OCR_API_URL |
pipeline.ocr_api.api_url |
http://localhost:5002/v1/... |
GLMOCR_OCR_API_KEY |
pipeline.ocr_api.api_key |
token-xyz |
GLMOCR_OCR_API_HOST |
pipeline.ocr_api.api_host |
localhost |
GLMOCR_OCR_API_PORT |
pipeline.ocr_api.api_port |
5002 |
GLMOCR_OCR_MODEL |
pipeline.ocr_api.model |
glm-ocr-model |
GLMOCR_LOG_LEVEL |
logging.level |
DEBUG, INFO, WARNING, ERROR |
The SDK walks up from the current working directory looking for a .env file.
Values from the .env file are merged with real environment variables, with
real env vars always taking priority.
# .env
ZHIPU_API_KEY=sk-my-secret-key
GLMOCR_MODE=maas
GLMOCR_LOG_LEVEL=DEBUGGlmOcr() and the convenience parse() function accept these keyword
arguments. They map to the same settings as the environment variables but
with higher priority.
| Keyword | Type | Description |
|---|---|---|
config_path |
str |
Path to a YAML config file (optional). |
api_key |
str |
API key. Providing this without mode auto-enables MaaS. |
api_url |
str |
MaaS API endpoint URL. |
model |
str |
Model name. |
mode |
str |
"maas" or "selfhosted". |
timeout |
int |
Request timeout in seconds. |
log_level |
str |
Logging level. |
The return type mirrors the input type for ergonomic usage:
# Single file → single PipelineResult
result = parser.parse("image.png")
result.save("./output")
# Multiple files → list of PipelineResult
results = parser.parse(["img1.png", "doc.pdf"])
for r in results:
r.save("./output")Type checkers see proper @overload signatures — no casts needed.
Every PipelineResult can be serialized without touching the file system:
Returns a JSON-serializable Python dict:
d = result.to_dict()
# {
# "json_result": [[{"index": 0, "label": "text", "content": "...", "bbox_2d": [...]}]],
# "markdown_result": "# Page title\n...",
# "original_images": ["/abs/path/to/image.png"],
# "usage": {"total_tokens": 1234}, # present in MaaS mode
# "data_info": {"pages": [...]}, # present in MaaS mode
# }Returns a JSON string. Keyword arguments are forwarded to json.dumps.
Defaults: ensure_ascii=False, indent=2.
json_str = result.to_json() # pretty-printed
json_str = result.to_json(indent=None) # compact single lineWrites JSON + Markdown files (with cropped images) to disk:
result.save(output_dir="./output")json_result is a list of pages, each page a list of regions:
[
[
{
"index": 0,
"label": "title",
"content": "Annual Report 2024",
"bbox_2d": [100, 50, 900, 120]
},
{
"index": 1,
"label": "text",
"content": "Revenue grew 15% year-over-year...",
"bbox_2d": [100, 140, 900, 400]
}
]
]Coordinates (bbox_2d) are normalised to a 0–1000 scale regardless
of the backend (MaaS or self-hosted).
Labels: title, text, table, figure, formula, header,
footer, page_number, reference, etc.
When a MaaS request fails, the SDK returns a PipelineResult with an
_error attribute instead of raising:
result = parser.parse("image.png")
d = result.to_dict()
if "error" in d:
print("Parsing failed:", d["error"])
else:
print(d["markdown_result"])When wrapping GLM-OCR as an MCP tool:
import json
import glmocr
def ocr_tool(image_path: str) -> str:
"""Parse a document and return structured JSON."""
result = glmocr.parse(image_path)
return result.to_json()The tool only needs ZHIPU_API_KEY in the environment (or .env file).
No YAML configuration is required.
For advanced use cases you can build config objects directly:
from glmocr.config import GlmOcrConfig
cfg = GlmOcrConfig.from_env(
api_key="sk-xxx",
mode="maas",
timeout=600,
log_level="DEBUG",
)
print(cfg.to_dict())from_env() respects the full priority chain:
kwargs > os.environ > .env > YAML > defaults.
python -m pytest glmocr/tests/test_unit.py -vAll tests run without network access or GPU. MaaS/Pipeline internals are mocked where needed.