OCR API
- process - OCR
OCR
from mistralai.client import Mistral
import os
with Mistral(
api_key=os.getenv("MISTRAL_API_KEY", ""),
) as mistral:
res = mistral.ocr.process(model="CX-9", document={
"type": "document_url",
"document_url": "https://upset-labourer.net/",
}, bbox_annotation_format={
"type": "text",
}, document_annotation_format={
"type": "text",
})
# Handle response
print(res)| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
model |
Nullable[str] | ✔️ | N/A | |
document |
models.DocumentUnion | ✔️ | Document to run OCR on | |
id |
Optional[str] | ➖ | N/A | |
pages |
OptionalNullable[models.Pages] | ➖ | Specific pages to process. Accepts a list of integers or a string of comma-separated numbers and ranges (e.g. '0,1,2' or '0-5' or '0,2-4'). Page numbers start from 0. | |
include_image_base64 |
OptionalNullable[bool] | ➖ | Include image URLs in response | |
image_limit |
OptionalNullable[int] | ➖ | Max images to extract | |
image_min_size |
OptionalNullable[int] | ➖ | Minimum height and width of image to extract | |
bbox_annotation_format |
OptionalNullable[models.ResponseFormat] | ➖ | Structured output class for extracting useful information from each extracted bounding box / image from document. Only json_schema is valid for this field | Example 1: { "type": "text" } Example 2: { "type": "json_object" } Example 3: { "type": "json_schema", "json_schema": { "schema": { "properties": { "name": { "title": "Name", "type": "string" }, "authors": { "items": { "type": "string" }, "title": "Authors", "type": "array" } }, "required": [ "name", "authors" ], "title": "Book", "type": "object", "additionalProperties": false }, "name": "book", "strict": true } } |
document_annotation_format |
OptionalNullable[models.ResponseFormat] | ➖ | Structured output class for extracting useful information from the entire document. Only json_schema is valid for this field | Example 1: { "type": "text" } Example 2: { "type": "json_object" } Example 3: { "type": "json_schema", "json_schema": { "schema": { "properties": { "name": { "title": "Name", "type": "string" }, "authors": { "items": { "type": "string" }, "title": "Authors", "type": "array" } }, "required": [ "name", "authors" ], "title": "Book", "type": "object", "additionalProperties": false }, "name": "book", "strict": true } } |
document_annotation_prompt |
OptionalNullable[str] | ➖ | Optional prompt to guide the model in extracting structured output from the entire document. A document_annotation_format must be provided. | |
table_format |
OptionalNullable[models.TableFormat] | ➖ | N/A | |
extract_header |
Optional[bool] | ➖ | N/A | |
extract_footer |
Optional[bool] | ➖ | N/A | |
confidence_scores_granularity |
OptionalNullable[models.ConfidenceScoresGranularity] | ➖ | Granularity for confidence scores: 'word' (per-word scores) or 'page' (aggregate only). Defaults to None (no confidence scores) to keep response payload small. | |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.HTTPValidationError | 422 | application/json |
| errors.SDKError | 4XX, 5XX | */* |