Skip to content

Add LLM-driven citation discovery system#10

Merged
daharoni merged 2 commits intomainfrom
refactor-shared-bib-utils
Apr 11, 2026
Merged

Add LLM-driven citation discovery system#10
daharoni merged 2 commits intomainfrom
refactor-shared-bib-utils

Conversation

@daharoni
Copy link
Copy Markdown
Contributor

Summary

  • Adds a citation discovery pipeline that uses OpenAlex citation graphs and Claude Code sub-agents to find papers related to the lab's tools
  • Pre-fetches BibTeX (CrossRef) and full text (PMC/Unpaywall/bioRxiv) so sub-agents only need local file access
  • Tracks papers through a 5-stage pipeline: candidates → in-progress → reviewed/rejected → approved → references.bib
  • Captures all paper types: science using tools, methods extensions, software (Minian), tool papers, and reviews
  • Includes initial backlog of 1,238 candidate papers from 4 seed papers
  • Extracts shared utilities (generate_key, slugify, normalize_title) into bib_utils.py for reuse
  • 101 unit tests with mocked APIs, CI workflow added
  • Config-driven design so other groups can clone and adapt for their own tools

New files

  • discovery/ — Python package (config, OpenAlex, CrossRef, Unpaywall, fulltext, candidates, analysis, approve)
  • discovery/README.md — Full documentation
  • .claude/commands/ — analyze-citation and run-discovery skills
  • tests/ — 8 test modules, 101 tests
  • pipeline/ — 5-stage directory structure with 1,238 initial candidates
  • discovery_config.yaml — Seed papers, tool definitions, keywords, API settings
  • .github/workflows/test-discovery.yml — CI for tests

Modified files

  • scripts/bib_utils.py — Added shared utilities extracted from normalize_keys.py and check_duplicates.py
  • scripts/normalize_keys.py — Imports from bib_utils.py instead of defining own copies
  • scripts/check_duplicates.py — Same
  • requirements.txt — Added pyyaml, pytest, responses
  • .gitignore — Added pipeline companion files (.txt, .bib), in-progress/, pdfs/

Test plan

  • pytest tests/ -v passes (101 tests)
  • python scripts/normalize_keys.py still works on references.bib
  • python scripts/check_duplicates.py still works on references.bib
  • python -m discovery.generate_candidates generates candidate YAML files
  • python -m discovery.fulltext --stage candidates fetches BibTeX and full text
  • Sub-agent analysis correctly categorizes known papers (tested with Aharoni 2019, Dong 2022 Minian, Madruga 2025 2P Miniscope)
  • python -m discovery.approve appends approved citations to references.bib

🤖 Generated with Claude Code

daharoni and others added 2 commits April 10, 2026 23:06
Adds a complete system for finding papers related to the lab's tools by
crawling citation graphs (OpenAlex) and analyzing full text with Claude
Code sub-agents. Papers are categorized as science, methods, software,
tool_paper, or review based on their relationship to the tools.

Pipeline: candidates/ → in-progress/ → reviewed/ or rejected/ → approved/
Pre-fetch step (fulltext.py) retrieves BibTeX and full text before analysis
so sub-agents only need file read access.

- discovery/ package: config, OpenAlex, CrossRef, Unpaywall, fulltext, candidates, analysis, approve
- .claude/commands/: analyze-citation and run-discovery skills
- tests/: 101 unit tests with mocked APIs
- CI: test-discovery.yml workflow
- Extracted shared utils (generate_key, slugify, normalize_title) into bib_utils.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generated from OpenAlex citation graphs for 4 seed papers:
- Shuman et al. 2019 (10.1038/s41593-019-0559-0)
- Cai et al. 2016 (10.1038/nature17955)
- Zhao et al. MiniXL (10.1126/sciadv.ads4995)
- Guo et al. MiniLFOV (10.1126/sciadv.adg3918)

1,218 candidates pending analysis, 4 reviewed, 16 rejected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@daharoni daharoni merged commit 585896d into main Apr 11, 2026
1 check passed
@daharoni daharoni deleted the refactor-shared-bib-utils branch April 11, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant