Research paper analysis pipeline with citation crawling, pluggable LLM prompts, and knowledge graph building.
- Multi-input: Analyze papers by title, DOI, keywords, URL, local PDF, or raw text
- Citation Crawling: BFS traversal of references/citations via Semantic Scholar API (configurable depth, default 3)
- 5 Default Analysis Dimensions (LLM Evaluation focus):
- Paper Analysis — overview, contributions, methodology
- Dataset Crafting — data creation, annotation, preprocessing
- Evaluation Method — benchmarks, baselines, evaluation setup
- Metrics — specific metrics, reported results
- Statistical Tests — significance tests, confidence intervals, rigor
- Pluggable Prompts: Add YAML files for custom dimensions, override defaults
- Knowledge Graph: NetworkX-based graph with paper/author/method/dataset/metric entities
- Multi-Provider LLM: Via LiteLLM — OpenAI, Anthropic, Ollama, vLLM, etc. with fallback chain
- Export: JSON, GraphML, GEXF, CSV
uv sync# Initialize project config
uv run crab init
# Edit .env with your API key
nano .env
# Analyze a paper by title
uv run crab analyze "attention is all you need"
# Search by keywords
uv run crab analyze --keywords "LLM evaluation, benchmark contamination"
# Analyze a local PDF
uv run crab analyze --pdf paper.pdf
# Control crawl depth
uv run crab analyze "GPT-4 Technical Report" --depth 5
# Search without analyzing
uv run crab search "transformer evaluation"
# Build knowledge graph from results
uv run crab build
# Export graph
uv run crab export json
uv run crab export graphml
uv run crab export csv
# List analysis dimensions
uv run crab dimensions
# Show config
uv run crab infoSettings load from: CLI flags > env vars (CRAB_ prefix) > .env > crab.yaml > defaults.
# crab.yaml
default_model: openai/gpt-4o-mini
fallback_models:
- openai/gpt-3.5-turbo
- anthropic/claude-3-haiku-20240307
citation_depth: 3
max_papers: 50
output: output
concurrency: 4Create YAML files in a custom directory:
# my_prompts/bias_analysis.yaml
name: bias_analysis
display_name: "Bias Analysis"
description: "Analyze papers for bias in LLM evaluation"
system_message: "You are a bias analysis expert..."
extraction_prompt: |
Analyze the paper for potential biases...
Paper: {title}
Text: {paper_text}
...Then use: uv run crab analyze "paper" --prompts-dir my_prompts/
from crab_scholar.pipeline import run_pipeline
from crab_scholar.config import CrabConfig
config = CrabConfig(
default_model="openai/gpt-4o-mini",
citation_depth=3,
)
kg = run_pipeline(input_query="attention is all you need", config=config)
print(f"Entities: {kg.entity_count}, Relations: {kg.relation_count}")Input (query/DOI/PDF/text)
↓
Scholar API → Resolve paper
↓
BFS Crawler → Expand citations/references (depth=N)
↓
Fetcher → Download PDFs, extract text
↓
Analyzer → Run pluggable dimensions (5 defaults)
↓
Graph Builder → Entities + Relations → NetworkX
↓
Export → JSON / GraphML / GEXF / CSV
MIT