Skip to content

Latest commit

 

History

History
70 lines (51 loc) · 37.9 KB

File metadata and controls

70 lines (51 loc) · 37.9 KB

Ocr

Overview

OCR API

Available Operations

process

OCR

Example Usage

from mistralai.client import Mistral
import os


with Mistral(
    api_key=os.getenv("MISTRAL_API_KEY", ""),
) as mistral:

    res = mistral.ocr.process(model="CX-9", document={
        "type": "document_url",
        "document_url": "https://upset-labourer.net/",
    }, bbox_annotation_format={
        "type": "text",
    }, document_annotation_format={
        "type": "text",
    })

    # Handle response
    print(res)

Parameters

Parameter Type Required Description Example
model Nullable[str] ✔️ N/A
document models.DocumentUnion ✔️ Document to run OCR on
id Optional[str] N/A
pages OptionalNullable[models.Pages] Specific pages to process. Accepts a list of integers or a string of comma-separated numbers and ranges (e.g. '0,1,2' or '0-5' or '0,2-4'). Page numbers start from 0.
include_image_base64 OptionalNullable[bool] Include image URLs in response
image_limit OptionalNullable[int] Max images to extract
image_min_size OptionalNullable[int] Minimum height and width of image to extract
bbox_annotation_format OptionalNullable[models.ResponseFormat] Structured output class for extracting useful information from each extracted bounding box / image from document. Only json_schema is valid for this field Example 1: {
"type": "text"
}
Example 2: {
"type": "json_object"
}
Example 3: {
"type": "json_schema",
"json_schema": {
"schema": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"authors": {
"items": {
"type": "string"
},
"title": "Authors",
"type": "array"
}
},
"required": [
"name",
"authors"
],
"title": "Book",
"type": "object",
"additionalProperties": false
},
"name": "book",
"strict": true
}
}
document_annotation_format OptionalNullable[models.ResponseFormat] Structured output class for extracting useful information from the entire document. Only json_schema is valid for this field Example 1: {
"type": "text"
}
Example 2: {
"type": "json_object"
}
Example 3: {
"type": "json_schema",
"json_schema": {
"schema": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"authors": {
"items": {
"type": "string"
},
"title": "Authors",
"type": "array"
}
},
"required": [
"name",
"authors"
],
"title": "Book",
"type": "object",
"additionalProperties": false
},
"name": "book",
"strict": true
}
}
document_annotation_prompt OptionalNullable[str] Optional prompt to guide the model in extracting structured output from the entire document. A document_annotation_format must be provided.
table_format OptionalNullable[models.TableFormat] N/A
extract_header Optional[bool] N/A
extract_footer Optional[bool] N/A
confidence_scores_granularity OptionalNullable[models.ConfidenceScoresGranularity] Granularity for confidence scores: 'word' (per-word scores) or 'page' (aggregate only). Defaults to None (no confidence scores) to keep response payload small.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.OCRResponse

Errors

Error Type Status Code Content Type
errors.HTTPValidationError 422 application/json
errors.SDKError 4XX, 5XX */*