The lit-builder pipeline (tools/lit-builder/) is set up and ready except for the LLM scoring step, which needs a credential.
What's already done
- A copy of
iclr-lit-builder retuned with an RL/RLHF/reasoning/agentic keyword set in tools/lit-builder/configs/keywords.yaml.
- Installed in an isolated venv at
tools/lit-builder/.venv/.
- ICLR 2026 fetched, ingested, and keyword-filtered: 19,813 papers ingested → 4,329 RL-relevant candidates, sitting in
tools/lit-builder/data/db/lit.sqlite (gitignored, regenerable).
What's needed
Pick one credential and export it:
# Option A — Claude Haiku (cheap; ~$1–3 for 4,329 papers):
export LIT_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Option B — Ollama Cloud (uses the :cloud models already pulled):
export LIT_PROVIDER=ollama
export OLLAMA_API_KEY=...
export LIT_MODEL=deepseek-v4-pro:cloud
# Option C — fully local, free, slower:
ollama pull llama3.1:8b
export LIT_PROVIDER=ollama
export LIT_MODEL=llama3.1:8b
Then:
cd tools/lit-builder
LIT=.venv/bin/lit
$LIT score iclr2026 --limit 4329 # triage all survivors 0-3
$LIT list iclr2026 --min-score 2 # the curated list (~50–200 expected)
$LIT deepen iclr2026 <paper_id> # structured digest for the top ~20–30 per area
After scoring
Fold the score≥2 list + the digests into reference/papers/<topic>/README.md (the three subdirs: RLHF-and-Alignment, LLM-Code-Generation, LLM-RL-Program-Synthesis), keeping the auto-scraped PAPERS.md files as the unfiltered appendix.
More venues (one command each)
$LIT fetch nips2025 && $LIT ingest nips2025 && $LIT filter nips2025
$LIT fetch icml2025 && $LIT ingest icml2025 && $LIT filter icml2025
$LIT fetch iclr2025 && $LIT ingest iclr2025 && $LIT filter iclr2025
$LIT fetch nips2024 && $LIT ingest nips2024 && $LIT filter nips2024
$LIT fetch icml2024 && $LIT ingest icml2024 && $LIT filter icml2024
tools/lit-builder/README.md has all of the above in the "Running this in this repo" section.
Why this matters
Right now reference/papers/ has ~430 raw arXiv abstracts (the PAPERS.md dumps) and short hand-written sub-READMEs. The scored + deepened lit-builder output would give the repo a curated, ranked, digest-level paper layer — which is what turns "here are some papers" into "here are the dozen papers you should actually read for area X."
The lit-builder pipeline (
tools/lit-builder/) is set up and ready except for the LLM scoring step, which needs a credential.What's already done
iclr-lit-builderretuned with an RL/RLHF/reasoning/agentic keyword set intools/lit-builder/configs/keywords.yaml.tools/lit-builder/.venv/.tools/lit-builder/data/db/lit.sqlite(gitignored, regenerable).What's needed
Pick one credential and export it:
Then:
After scoring
Fold the score≥2 list + the digests into
reference/papers/<topic>/README.md(the three subdirs:RLHF-and-Alignment,LLM-Code-Generation,LLM-RL-Program-Synthesis), keeping the auto-scrapedPAPERS.mdfiles as the unfiltered appendix.More venues (one command each)
tools/lit-builder/README.mdhas all of the above in the "Running this in this repo" section.Why this matters
Right now
reference/papers/has ~430 raw arXiv abstracts (thePAPERS.mddumps) and short hand-written sub-READMEs. The scored + deepened lit-builder output would give the repo a curated, ranked, digest-level paper layer — which is what turns "here are some papers" into "here are the dozen papers you should actually read for area X."