Paper Verifier — A Multiagent System for Scientific Paper Verification

A pure-Python pipeline of tool-using agents that takes a scientific paper PDF and returns a credibility-graded list of the paper's staged conclusions, attempts to reproduce each one, and writes a whole-paper summary plus follow-up research suggestions.

What it does

Given a paper PDF, the pipeline (a) extracts per-section staged conclusions and the figures / equations / tables that support each one, (b) verifies each conclusion against its supporting evidence — reproducing figures by codegen and walking equation derivations step-by-step — and (c) aggregates the per-conclusion verdicts into a final report with a credibility level + score per conclusion, a list of further-research points motivated by the gaps, and a whole-paper summary.

Each agent's responsibilities are narrow and the hand-off contracts (pydantic models) are explicit, so every verdict traces back to the section, evidence, and reasoning that produced it.

Pipeline overview

                          ┌────────────────────────────┐
   paper.pdf  ──────────► │ Phase 0: Extraction (CLI)  │ ─────► <stem>_paper/
                          │  verifier.extraction.*     │        manifest.json
                          │                            │        sections/*.md
                          │  - paper.py  orchestrator  │        figures/*
                          │  - placeholders.py  schema │        equations/*
                          │  - equations.py  VLM OCR   │        tables/*
                          │  - figures.py    render    │
                          │  - tables.py     VLM md    │
                          │  - layout.py     2-col     │
                          └────────────────────────────┘
                                       │
                                       ▼
                          ┌────────────────────────────┐
                          │ Phase 1: Extractor agent   │ ─────► <stem>_paper/
                          │  verifier.agents.extractor │        conclusions.json
                          └────────────────────────────┘
                                       │
                                       ▼
                          ┌────────────────────────────┐
                          │ Phase 1.5: Dispatcher      │ ─────► <stem>_paper/
                          │  verifier.agents.dispatcher│        verification_plan.json
                          │  (pure-Python, no LLM)     │
                          └────────────────────────────┘
                                       │
                                       ▼
                          ┌──────────────────────────────────────────────────┐
                          │   Phase 2 runner — verifier/pipeline.py          │
                          │   (pure Python loop, file-based checkpoint)      │
                          │                                                  │
                          │   for task in pending_tasks:                     │
                          │     if workdir/verdict.json exists:              │
                          │         load + skip   ◄─── resume after crash    │
                          │     elif task.method == "figure_reproduction":   │
                          │         figure_verifier_node(state)              │
                          │     elif task.method == "formula_derivation":    │
                          │         formula_verifier_node(state)             │
                          │   rewrite verifications.json after each task     │
                          └──────────────────────────────────────────────────┘
                                       │
                                       ▼
                          ┌──────────────────────────────────────────────────┐
                          │   Phase 3 summarizer — verifier/agents/summarizer.py │
                          │                                                  │
                          │   - Python: weighted overall_credibility         │
                          │   - 1 LLM call (structured): paper_summary       │
                          │                              + further_research  │
                          │   - merges Phase-2 verdicts + dispatcher skips   │
                          └──────────────────────────────────────────────────┘
                                       │
                                       ▼
                                  FinalReport ──►  report.json

Quick start

uv sync
cp .env.example .env   # set OPENAI_API_KEY

# End-to-end pipeline in one command. Cached extraction + cached
# conclusions are reused on re-runs; --force-extract redoes Phase 0+1.
PYTHONPATH=. uv run python -m verifier path/to/paper.pdf
PYTHONPATH=. uv run python -m verifier path/to/paper.pdf --force-extract
PYTHONPATH=. uv run python -m verifier path/to/paper.pdf --cache-dir /tmp/extracted

# Test
uv run pytest tests/

The final report lands at <cache-dir>/<stem>_paper/report.json, alongside the intermediate per-phase artifacts (manifest.json, conclusions.json, verification_plan.json, verifications.json).

Repository layout

hackathon20260511/
├── verifier/
│   ├── __main__.py                # `python -m verifier paper.pdf` — end-to-end CLI
│   ├── config.py                  # per-role model config
│   ├── state.py                   # shared TypedDict state passed between phases
│   ├── extraction/                # Phase 0
│   │   ├── paper.py               #   orchestrator
│   │   ├── placeholders.py        #   XML asset syntax
│   │   ├── equations.py           #   detect + VLM LaTeX
│   │   ├── figures.py             #   caption-anchor + render
│   │   ├── tables.py              #   caption-anchor + VLM markdown
│   │   └── layout.py              #   column-aware reading order
│   ├── agents/
│   │   ├── extractor.py           # Phase 1: per-section conclusion extraction
│   │   ├── dispatcher.py          # Phase 1.5: router / skipper (no LLM)
│   │   ├── figure_verifier.py     # Phase 2: figure reproduction loop
│   │   ├── formula_verifier.py    # Phase 2: derivation walker
│   │   └── summarizer.py          # Phase 3: single structured LLM call, no tools
│   ├── pipeline.py                # run_all (1→1.5→2→3) + Phase-2 file checkpoint
│   ├── tools/
│   │   ├── figure_tools.py        # 8-tool toolkit for the figure verifier
│   │   └── formula_tools.py       # 4-tool toolkit for the formula verifier
│   ├── schemas/                   # pydantic models for hand-off contracts
│   └── prompts/                   # one prompt module per agent
├── scripts/                       # per-phase manual harnesses (for partial reruns)
├── docs/
│   ├── architecture.md            # per-phase deep dive
│   └── models.md                  # per-agent model configuration
├── tests/
├── data/
│   └── extracted/                 # cached Phase 0 + Phase 1 outputs (gitignored)
└── skills/

Documentation

docs/architecture.md — per-phase design: extraction internals, agent tool toolkits, verdict schema, state machine.
docs/models.md — per-agent model assignment and the reasoning behind each choice; env-var overrides.

Running phases standalone

For debugging one phase, swapping prompts, or re-running a single task, the harnesses in scripts/ work against an already-extracted directory without re-doing earlier phases:

# Phase 0 only (extraction)
PYTHONPATH=. uv run python scripts/try_extract.py path/to/paper.pdf data/extracted/<stem>_paper

# Phase 1 only (writes conclusions.json)
PYTHONPATH=. uv run python scripts/try_extractor.py data/extracted/<stem>_paper

# Phase 1.5 only (no LLM; writes verification_plan.json)
PYTHONPATH=. uv run python scripts/try_dispatcher.py data/extracted/<stem>_paper

# Phase 2 single task (figure or formula)
PYTHONPATH=. uv run python scripts/try_figure_verifier.py data/extracted/<stem>_paper [task_idx]
PYTHONPATH=. uv run python scripts/try_formula_verifier.py data/extracted/<stem>_paper [task_idx]

# Phase 2 full run with per-task checkpoint resume
PYTHONPATH=. uv run python scripts/run_phase2.py data/extracted/<stem>_paper

# Phase 3 only (one LLM call; writes report.json)
PYTHONPATH=. uv run python scripts/run_phase3.py data/extracted/<stem>_paper

To force a re-run of one Phase-2 task, delete that task's workdir under data/extracted/<stem>_paper/figures/verifier_runs/ or equations/verifier_runs/.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
data/extracted		data/extracted
docs		docs
scripts		scripts
skills		skills
tests		tests
verifier		verifier
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Verifier — A Multiagent System for Scientific Paper Verification

What it does

Pipeline overview

Quick start

Repository layout

Documentation

Running phases standalone

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Paper Verifier — A Multiagent System for Scientific Paper Verification

What it does

Pipeline overview

Quick start

Repository layout

Documentation

Running phases standalone

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages