AgentFinVQA

AgentFinVQA is a multi-agent framework that helps vision-language models answer questions about financial charts more accurately. Rather than asking a single model to do everything at once, it breaks the task into specialized steps — reading chart text, planning what to look for, answering the question, and verifying the answer — producing measurably better results than a single-pass approach.

How it works

The pipeline coordinates four specialized agents:

Planner — reads the question and generates a structured inspection plan (without seeing the image)
OCR Reader — transcribes all text from the chart image
Vision Agent — executes the plan using the chart image and extracted text to produce an answer
Verifier — checks the draft answer and confirms or corrects it

Every run produces a Model Evaluation Packet (MEP) — a JSON file capturing the full reasoning trace, which can be used for analysis, debugging, and reproducible comparisons across models.

Installation

Requires uv. Install core dependencies:

uv sync
source .venv/bin/activate

To run the agentic pipeline (includes CrewAI, Google GenAI, Streamlit dashboard):

uv sync --group agentic-xai-eval
source .venv/bin/activate

Configuration

Copy .env.example to .env and fill in your API keys:

cp .env.example .env

Variable	Description
`GEMINI_API_KEY`	Google Gemini API key (vision backend)
`OPENAI_API_KEY`	OpenAI API key (optional planner/verifier backend)
`LANGFUSE_PUBLIC_KEY`	Langfuse public key (optional — enables tracing)
`LANGFUSE_SECRET_KEY`	Langfuse secret key (optional)
`LANGFUSE_HOST`	Langfuse host URL (optional)

Quick start

Run the pipeline on a small FinMME slice:

uv run --env-file .env -m agentfinvqa.runner.run_generate_meps \
    --dataset finmme \
    --split "train[:50]" \
    --config gemini_gemini \
    --workers 4 \
    --out meps/

Run the zero-shot baseline for comparison:

uv run --env-file .env baselines/run_zeroshot.py \
    --dataset finmme \
    --split "train[:50]"

Explore results in the dashboard:

uv run streamlit run src/agentfinvqa/eval/dashboard.py

Project structure

src/agentfinvqa/
├── agents/        # PlannerAgent, VisionAgent, VerifierAgent
├── datasets/      # Dataset loaders
├── eval/          # Metrics, evaluation scripts, Streamlit dashboard
├── mep/           # Model Evaluation Packet schema
├── runner/        # End-to-end pipeline runner
└── tools/         # OCR, vision QA, legend grounding tools

Acknowledgements

This work was supported by the Province of Ontario, the Government of Canada through CIFAR, and the Vector Institute.

This project has also received funding from the European Union's Horizon Europe research and innovation programme under grant agreement No. 101214389 (AIXPERT).

Contact

For questions, collaborations, or contributions, please open an issue in this repository or contact the corresponding author at shaina.raza@vectorinstitute.ai, as listed in the paper.

For questions about the FinMME benchmark, please refer to the original dataset by Luo et al.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
baselines		baselines
docs		docs
scripts		scripts
src/agentfinvqa		src/agentfinvqa
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
codecov.yml		codecov.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentFinVQA

How it works

Installation

Configuration

Quick start

Project structure

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentFinVQA

How it works

Installation

Configuration

Quick start

Project structure

Acknowledgements

Contact

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages