Escruta Extractor is a dedicated microservice for document parsing and content extraction within the Escruta platform. Built with Python, FastAPI and Microsoft's MarkItDown. Converts various file formats (PDF, DOCX, PPTX, XLSX, audio, and YouTube URLs) into clean Markdown for AI processing.
Important
This service is a required component of the Escruta ecosystem. It must be accessible to the Core service for proper document processing and ingestion.
- Python (version 3.12 or higher).
- uv package manager.
uv sync- Install dependenciesuv run fastapi dev- Start the development server
The extraction service will be available at localhost:8000 by default.
The application is secured and configured using environment variables. These must be set in your .env file or environment.
| Variable | Description | Default |
|---|---|---|
ESCRUTA_INTERNAL_API_KEY |
Internal API Key for service-to-service communication | (Required) |
uv run fastapi dev- Start development server with auto-reloaduv run fastapi run- Start production server
- Runtime: Python 3.12+ with uv for lightning-fast dependency management.
- Framework: FastAPI for high-performance, robust API endpoints.
- Extraction: Microsoft MarkItDown for reliable document-to-markdown conversion, including support for rich media, spreadsheets, and audio transcription.