For the current experimental write-up and results, see report/report.md.
Converting chemical diagrams into machine-readable representations (SMILES, InChI, molecular graphs) is fundamental for indexing the chemical literature. Classical OCSR tools often struggle on complex layouts and scanned pages. Modern deep learning approaches such as DECIMER and MolScribe demonstrate strong performance with dedicated training. Given the rapid evolution of multimodal foundation models, it is natural to ask whether general-purpose VLM/LLM systems can perform OCSR without chemistry-specific training—and what their failure modes look like.
This repository provides a ready-to-use Docker Compose environment for evaluating OCSR with current vision-enabled foundation models.
The project is inspired by MolMole (LG AI Research), which proposes an end-to-end framework for extracting molecules and reactions from full-page patent images and introduces an evaluation benchmark. The MolMole paper evaluates 550 annotated pages; due to copyright restrictions, only a 300-page patent subset is publicly released as MolMole_Patent300 dataset on HuggingFace.
The code in this repo runs holistic extraction: the model sees the entire page and is asked to return all structures and reactions in one JSON response. Extractions are evaluated in three output formats:
graph: atoms/bonds JSON (closest to an explicit molecular graph).smiles: SMILES strings.selfies: SELFIES strings.
All developer workflows run inside Docker/Compose; host-level execution is reserved for CI.
.
├── docker-compose.yml
├── Dockerfile
├── Makefile
├── .env.example
├── experiments_openrouter.yaml
├── report/
│ └── report.md
├── src/
│ └── molmole_research/
│ ├── downloader.py # download dataset + build labels.json from MOL files
│ ├── extractor.py # holistic OCSR extraction (graph / SMILES / SELFIES)
│ ├── converter.py # optional conversion helpers
│ ├── evaluator.py # compute metrics and write logs
│ └── runner.py # orchestrate multi-model runs
└── tests/
└── ...
This is the intended way to reproduce the pilot runs in results_openrouter_*. Keep secrets in .env (gitignored) and do not put API keys on the command line.
-
Create
.env:cp .env.example .env # edit .env and set OPENROUTER_KEY=...If you plan to run direct OpenAI API experiments (not via OpenRouter), you can also set
OPENAI_API_KEYin.env. -
Build the image:
make build
-
Download MolMole_Patent300 and build
labels.json:make download
-
Run OpenRouter experiments (small debug runs; adjust
--limitas needed):docker compose run --rm --user "$(id -u):$(id -g)" research \ python -m molmole_research.runner run \ --config experiments_openrouter.yaml \ --format graph \ --limit 5 \ --results-dir results_openrouter_graphRepeat for other formats:
--format smiles --results-dir results_openrouter_smiles--format selfies --results-dir results_openrouter_selfies
-
Inspect outputs:
- Raw model outputs:
results_openrouter_*/<experiment>.jsonl - Metrics:
results_openrouter_*/<experiment>_metrics.json - Logs:
results_openrouter_*/<experiment>_metrics.log - Aggregated summary:
results_openrouter_*/summary.json
- Raw model outputs:
The publicly released dataset lives on HuggingFace as doxa-friend/MolMole_Patent300 (license: CC-BY-NC-ND-4.0). The downloader uses huggingface_hub.snapshot_download to fetch the dataset snapshot and builds labels.json by converting the provided MOL files to canonical SMILES via RDKit.
To download the dataset into data/images, run:
make downloadIf the download fails due to authentication or license acceptance, the downloader prints instructions for manual setup.
All commands below are meant to run via Docker.
If you are using the OpenAI API directly, ensure OPENAI_API_KEY is set (for example via .env).
-
Run extraction (example: OpenAI, SMILES output):
docker compose run --rm --user "$(id -u):$(id -g)" research \ python -m molmole_research.extractor run \ --model gpt-4o \ --dataset-dir data/images \ --out results \ --format smiles \ --limit 5 -
Evaluate:
docker compose run --rm --user "$(id -u):$(id -g)" research \ python -m molmole_research.evaluator run \ --pred results/gpt-4o.jsonl \ --dataset-dir data/images \ --out results -
Optional conversion step (mainly useful for debugging):
docker compose run --rm --user "$(id -u):$(id -g)" research \ python -m molmole_research.converter run \ --pred results/gpt-4o.jsonl \ --out results
Notes:
- The extractor resumes by default (appends and skips already-processed pages). For a clean run, delete the output JSONL or use
--no-resume. - Use
--timeoutto bound each request, and--limitfor short debug runs.
To run a YAML-defined set of experiments (recommended for OpenRouter), use:
docker compose run --rm --user "$(id -u):$(id -g)" research \
python -m molmole_research.runner run \
--config experiments_openrouter.yaml \
--format graph \
--limit 5 \
--results-dir results_openrouter_graphThe runner writes per-experiment JSONL outputs and per-experiment metrics, plus summary.json in the selected results directory.
experiments_openrouter.yamlsets the OpenRouter API base and declaresapi_key_env: OPENROUTER_KEY.- The runner reads
OPENROUTER_KEYfrom.env(or the environment) and passes it to the extractor via environment variables. - Start with
--limitand a small model set; OpenRouter runs can be expensive.
If you prefer an interactive session:
make shellInside the shell, you can run the same python -m molmole_research.<module> run ... commands.
If an existing Open-WebUI + Ollama stack is available at http://host.docker.internal:11434/v1 with the model ministral-3:14b, you can run a sample extraction from inside the research container:
python -m molmole_research.extractor run \
--model ministral-3:14b \
--api-base http://host.docker.internal:11434/v1 \
--api-key placeholder \
--dataset-dir data/images \
--out results \
--format graph \
--limit 5Notes:
- The Compose service includes
extra_hosts: host.docker.internalso the container can reach the host’s Ollama port. - The
--api-keyvalue is ignored by Ollama but required by the OpenAI client; any non-empty string is fine.
The provided Makefile defines several convenience targets:
| Target | Description |
|---|---|
make build |
Build the Docker image (installs dependencies). |
make shell |
Open a bash shell inside the research container. |
make test |
Run the full test suite inside the container. |
make lint |
Check code style with ruff inside the container. |
make format |
Format the code base with ruff format inside the container. |
make run |
Run the default runner inside the container. |
make download |
Download MolMole_Patent300 and build labels.json. |
make up / make down |
Start or stop the research container stack. |
Automated CI pipelines execute host-level commands (pytest, ruff) to keep runtimes fast. Outside CI, prefer the Docker workflow above and avoid creating local virtual environments. If you must reproduce the CI run locally, mirror its steps in a temporary venv and install requirements.txt, but treat that as an exception rather than the norm.
The extractor uses the OpenAI Python client and targets OpenAI-compatible APIs. For a provider that exposes an OpenAI-compatible API (OpenAI, OpenRouter, local Open-WebUI, etc.), set --api-base and provide credentials via environment variables or the runner configuration.
The articles/ directory contains additional papers used to inform this environment (no actual pdfs in repo). A brief summary of each paper is available in articles/relevant_articles.md.
This project is released under the MIT License. Individual datasets and published papers retain their respective licenses; please consult the original sources for details.