Add LLM-driven citation discovery system#10
Merged
Conversation
Adds a complete system for finding papers related to the lab's tools by crawling citation graphs (OpenAlex) and analyzing full text with Claude Code sub-agents. Papers are categorized as science, methods, software, tool_paper, or review based on their relationship to the tools. Pipeline: candidates/ → in-progress/ → reviewed/ or rejected/ → approved/ Pre-fetch step (fulltext.py) retrieves BibTeX and full text before analysis so sub-agents only need file read access. - discovery/ package: config, OpenAlex, CrossRef, Unpaywall, fulltext, candidates, analysis, approve - .claude/commands/: analyze-citation and run-discovery skills - tests/: 101 unit tests with mocked APIs - CI: test-discovery.yml workflow - Extracted shared utils (generate_key, slugify, normalize_title) into bib_utils.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Generated from OpenAlex citation graphs for 4 seed papers: - Shuman et al. 2019 (10.1038/s41593-019-0559-0) - Cai et al. 2016 (10.1038/nature17955) - Zhao et al. MiniXL (10.1126/sciadv.ads4995) - Guo et al. MiniLFOV (10.1126/sciadv.adg3918) 1,218 candidates pending analysis, 4 reviewed, 16 rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New files
discovery/— Python package (config, OpenAlex, CrossRef, Unpaywall, fulltext, candidates, analysis, approve)discovery/README.md— Full documentation.claude/commands/— analyze-citation and run-discovery skillstests/— 8 test modules, 101 testspipeline/— 5-stage directory structure with 1,238 initial candidatesdiscovery_config.yaml— Seed papers, tool definitions, keywords, API settings.github/workflows/test-discovery.yml— CI for testsModified files
scripts/bib_utils.py— Added shared utilities extracted from normalize_keys.py and check_duplicates.pyscripts/normalize_keys.py— Imports from bib_utils.py instead of defining own copiesscripts/check_duplicates.py— Samerequirements.txt— Added pyyaml, pytest, responses.gitignore— Added pipeline companion files (.txt, .bib), in-progress/, pdfs/Test plan
pytest tests/ -vpasses (101 tests)python scripts/normalize_keys.pystill works on references.bibpython scripts/check_duplicates.pystill works on references.bibpython -m discovery.generate_candidatesgenerates candidate YAML filespython -m discovery.fulltext --stage candidatesfetches BibTeX and full textpython -m discovery.approveappends approved citations to references.bib🤖 Generated with Claude Code