Build the full end-to-end workflow connecting preprocessing, classification, and extraction into a extensive pipeline.
Description:
- Connect all pipeline components: PDF preprocessing → classification → metric extraction → JSON output
- Add batch processing capabilities to handle directories of PDFs efficiently
- Implement comprehensive logging to track classification decisions, extraction results, and errors
Goal: Deliver a complete, automated pipeline that processes PDFs end-to-end with clear audit trails and structured outputs.