Local tool to capture web papers/articles, parse them into clean text + sections + references, and export in a few useful formats.
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
python run.pyOpen:
- Library: http://127.0.0.1:8000/library/
- Collections: http://127.0.0.1:8000/collections/
- Open
chrome://extensions - Enable Developer mode
- Load unpacked → select
extensions/chrome/
If your server isn’t on http://127.0.0.1:8000, edit extensions/chrome/background.js (API_ENDPOINT).
- Extension posts captures to
POST /api/captures/(URL + full HTML + best-effort main content + metadata). - Server parses with a site-aware parser (PMC/OUP/Wiley/…) and falls back to generic heuristics.
- Data is stored locally:
- SQLite:
data/db.sqlite3 - Artifacts:
data/artifacts/<capture_id>/
- SQLite:
Each capture has a folder at data/artifacts/<capture_id>/, typically containing:
page.html,content.htmlarticle.json,reduced.jsonsections.json,references.jsonpaper.md(deterministic bundle)
- BibTeX:
/exports/bibtex/ - RIS:
/exports/ris/ - Master Markdown:
/exports/master.md/ - Papers JSONL:
/exports/papers.jsonl/
Add ?collection=<collection_id> to export a specific collection.
pip install -r requirements-dev.txt
pytest