Skip to content

DavidWang19/DIS-Student

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper Verifier — A Multiagent System for Scientific Paper Verification

A pure-Python pipeline of tool-using agents that takes a scientific paper PDF and returns a credibility-graded list of the paper's staged conclusions, attempts to reproduce each one, and writes a whole-paper summary plus follow-up research suggestions.

What it does

Given a paper PDF, the pipeline (a) extracts per-section staged conclusions and the figures / equations / tables that support each one, (b) verifies each conclusion against its supporting evidence — reproducing figures by codegen and walking equation derivations step-by-step — and (c) aggregates the per-conclusion verdicts into a final report with a credibility level + score per conclusion, a list of further-research points motivated by the gaps, and a whole-paper summary.

Each agent's responsibilities are narrow and the hand-off contracts (pydantic models) are explicit, so every verdict traces back to the section, evidence, and reasoning that produced it.

Pipeline overview

                          ┌────────────────────────────┐
   paper.pdf  ──────────► │ Phase 0: Extraction (CLI)  │ ─────► <stem>_paper/
                          │  verifier.extraction.*     │        manifest.json
                          │                            │        sections/*.md
                          │  - paper.py  orchestrator  │        figures/*
                          │  - placeholders.py  schema │        equations/*
                          │  - equations.py  VLM OCR   │        tables/*
                          │  - figures.py    render    │
                          │  - tables.py     VLM md    │
                          │  - layout.py     2-col     │
                          └────────────────────────────┘
                                       │
                                       ▼
                          ┌────────────────────────────┐
                          │ Phase 1: Extractor agent   │ ─────► <stem>_paper/
                          │  verifier.agents.extractor │        conclusions.json
                          └────────────────────────────┘
                                       │
                                       ▼
                          ┌────────────────────────────┐
                          │ Phase 1.5: Dispatcher      │ ─────► <stem>_paper/
                          │  verifier.agents.dispatcher│        verification_plan.json
                          │  (pure-Python, no LLM)     │
                          └────────────────────────────┘
                                       │
                                       ▼
                          ┌──────────────────────────────────────────────────┐
                          │   Phase 2 runner — verifier/pipeline.py          │
                          │   (pure Python loop, file-based checkpoint)      │
                          │                                                  │
                          │   for task in pending_tasks:                     │
                          │     if workdir/verdict.json exists:              │
                          │         load + skip   ◄─── resume after crash    │
                          │     elif task.method == "figure_reproduction":   │
                          │         figure_verifier_node(state)              │
                          │     elif task.method == "formula_derivation":    │
                          │         formula_verifier_node(state)             │
                          │   rewrite verifications.json after each task     │
                          └──────────────────────────────────────────────────┘
                                       │
                                       ▼
                          ┌──────────────────────────────────────────────────┐
                          │   Phase 3 summarizer — verifier/agents/summarizer.py │
                          │                                                  │
                          │   - Python: weighted overall_credibility         │
                          │   - 1 LLM call (structured): paper_summary       │
                          │                              + further_research  │
                          │   - merges Phase-2 verdicts + dispatcher skips   │
                          └──────────────────────────────────────────────────┘
                                       │
                                       ▼
                                  FinalReport ──►  report.json

Quick start

uv sync
cp .env.example .env   # set OPENAI_API_KEY

# End-to-end pipeline in one command. Cached extraction + cached
# conclusions are reused on re-runs; --force-extract redoes Phase 0+1.
PYTHONPATH=. uv run python -m verifier path/to/paper.pdf
PYTHONPATH=. uv run python -m verifier path/to/paper.pdf --force-extract
PYTHONPATH=. uv run python -m verifier path/to/paper.pdf --cache-dir /tmp/extracted

# Test
uv run pytest tests/

The final report lands at <cache-dir>/<stem>_paper/report.json, alongside the intermediate per-phase artifacts (manifest.json, conclusions.json, verification_plan.json, verifications.json).

Repository layout

hackathon20260511/
├── verifier/
│   ├── __main__.py                # `python -m verifier paper.pdf` — end-to-end CLI
│   ├── config.py                  # per-role model config
│   ├── state.py                   # shared TypedDict state passed between phases
│   ├── extraction/                # Phase 0
│   │   ├── paper.py               #   orchestrator
│   │   ├── placeholders.py        #   XML asset syntax
│   │   ├── equations.py           #   detect + VLM LaTeX
│   │   ├── figures.py             #   caption-anchor + render
│   │   ├── tables.py              #   caption-anchor + VLM markdown
│   │   └── layout.py              #   column-aware reading order
│   ├── agents/
│   │   ├── extractor.py           # Phase 1: per-section conclusion extraction
│   │   ├── dispatcher.py          # Phase 1.5: router / skipper (no LLM)
│   │   ├── figure_verifier.py     # Phase 2: figure reproduction loop
│   │   ├── formula_verifier.py    # Phase 2: derivation walker
│   │   └── summarizer.py          # Phase 3: single structured LLM call, no tools
│   ├── pipeline.py                # run_all (1→1.5→2→3) + Phase-2 file checkpoint
│   ├── tools/
│   │   ├── figure_tools.py        # 8-tool toolkit for the figure verifier
│   │   └── formula_tools.py       # 4-tool toolkit for the formula verifier
│   ├── schemas/                   # pydantic models for hand-off contracts
│   └── prompts/                   # one prompt module per agent
├── scripts/                       # per-phase manual harnesses (for partial reruns)
├── docs/
│   ├── architecture.md            # per-phase deep dive
│   └── models.md                  # per-agent model configuration
├── tests/
├── data/
│   └── extracted/                 # cached Phase 0 + Phase 1 outputs (gitignored)
└── skills/

Documentation

  • docs/architecture.md — per-phase design: extraction internals, agent tool toolkits, verdict schema, state machine.
  • docs/models.md — per-agent model assignment and the reasoning behind each choice; env-var overrides.

Running phases standalone

For debugging one phase, swapping prompts, or re-running a single task, the harnesses in scripts/ work against an already-extracted directory without re-doing earlier phases:

# Phase 0 only (extraction)
PYTHONPATH=. uv run python scripts/try_extract.py path/to/paper.pdf data/extracted/<stem>_paper

# Phase 1 only (writes conclusions.json)
PYTHONPATH=. uv run python scripts/try_extractor.py data/extracted/<stem>_paper

# Phase 1.5 only (no LLM; writes verification_plan.json)
PYTHONPATH=. uv run python scripts/try_dispatcher.py data/extracted/<stem>_paper

# Phase 2 single task (figure or formula)
PYTHONPATH=. uv run python scripts/try_figure_verifier.py data/extracted/<stem>_paper [task_idx]
PYTHONPATH=. uv run python scripts/try_formula_verifier.py data/extracted/<stem>_paper [task_idx]

# Phase 2 full run with per-task checkpoint resume
PYTHONPATH=. uv run python scripts/run_phase2.py data/extracted/<stem>_paper

# Phase 3 only (one LLM call; writes report.json)
PYTHONPATH=. uv run python scripts/run_phase3.py data/extracted/<stem>_paper

To force a re-run of one Phase-2 task, delete that task's workdir under data/extracted/<stem>_paper/figures/verifier_runs/ or equations/verifier_runs/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors