Add PDF loading support and update dependencies by iam-tsr · Pull Request #1565 · mofa-org/mofa

iam-tsr · 2026-04-02T10:26:06Z

🧠 Context

This pull request adds support for PDF document loading to the mofa-foundation RAG pipeline, enabling users to extract and process text from PDF files as part of their retrieval-augmented generation workflows. The implementation introduces a new PdfLoader (behind the pdf feature flag), integrates it into the pipeline and documentation, and provides tests and usage examples. Additionally, a small bug fix is included in the Python example.

PDF Document Loading Support

Added a new PdfLoader struct implementing the DocumentLoader trait, allowing extraction of text from PDF files using the pdf-extract crate. The loader handles file extension checks, error handling, and metadata population.
Introduced a new LoaderError::PdfParseError variant for robust error reporting when PDF parsing fails.
Enabled the pdf feature in mofa-foundation and rag_pipeline examples, updating Cargo.toml files and documentation to describe PDF support and usage.

Examples and Tests

Added integration and unit tests for PdfLoader covering extension checks and metadata, and included a demonstration of PDF loading and chunking in the rag_pipeline example.

Bug Fix

Fixed a typo in the Python analyze.py script where "analyses.append("sentiment")" was missing a closing quote and bracket, correcting it to result["analyses"].append("sentiment").

- Introduced PdfLoader for loading and processing PDF documents. - Added optional PDF support in mofa-foundation and rag_pipeline. - Updated Cargo.lock with new dependencies: pdf-extract, adobe-cmap-parser, and others. - Enhanced README and example files to demonstrate PDF functionality.

iam-tsr added 2 commits April 2, 2026 15:47

Fix syntax error in analyze_text function for sentiment analysis

41d084b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PDF loading support and update dependencies#1565

Add PDF loading support and update dependencies#1565
iam-tsr wants to merge 2 commits intomofa-org:mainfrom
iam-tsr:feat/pdf-extract

iam-tsr commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iam-tsr commented Apr 2, 2026

🧠 Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant