Last update: 2026-04-04
An enhanced version of the GPT paper assistant.
This repo now supports both Personalized Daily Arxiv Paper and a daily AI hotspots digest, then publishes them as a multi-page static site.
See the changelog for recent changes and the hotspot implementation plan for the new multi-source news pipeline.
This repository has two complementary pipelines:
- Personalized Daily Arxiv Paper: fetch new arXiv papers, filter them with author rules and LLM scoring, then render daily, monthly, and yearly archives.
- Daily AI Hotspots: aggregate papers, official blogs, roundup/news sites, GitHub, and Hacker News, then use clustering plus LLM screening to produce a concise daily "what matters today" summary.
The generated results are pushed to the auto_update branch. The main branch should stay code-only.
- Personalized daily arXiv filtering with configurable prompts and score thresholds
- Daily AI hotspots digest built from multiple external signals
- Monthly paper summaries and monthly/yearly hotspot archives
- Multi-page static website with day/month/year navigation
- Automatic model pricing refresh from LiteLLM
- GitHub Actions workflows for daily runs, missed-date remediation, result sync, and Pages publishing
- Copy/fork this repo to a new GitHub repo and enable scheduled workflows if you fork it.
- Review the paper prompts under
prompts/paper/, especiallyprompts/paper/paper_topics.txt, and edit them to match the kinds of papers you want to follow. If you want a clean starting point, use files intemplatesas references. - Copy
configs/templates/config.template.initoconfigs/config.iniand set your desired ArXiv categoriesarxiv_category. - Set your openai key
OPENAI_API_KEYand base urlOPENAI_BASE_URL(if you need one) as GitHub Secrets. To can get a free one from GitHub, please reference GUIDE_GITHUB_API.md. - In your repo settings, set GitHub page build sources to be GitHub Actions.
At this point, your bot should run daily and publish a static website. The results will be pushed to the auto_update branch automatically. You can test this by running the GitHub action workflow manually.
- Copy
configs/templates/authors.template.txttoconfigs/authors.txtand list the authors you actually want to follow. The numbers behind the author are important. They are semantic scholar author IDs which you can find by looking up the authors on semantic scholar and taking the numbers at the end of the URL. - Take a look at
configs/config.inito tweak how things are filtered. - Get and set up a
X_BEARER_TOKENas a GitHub secret, you can get one from X Developer Console. This is for the hotspot pipeline to grab daily tweets. - Get and set up a semantic scholar API key (
S2_KEY) as a GitHub secret. Otherwise the author search step will be very slow. (For now the keys are tight, so you may not be able to get one.) - Set up a slack bot, get the OAuth key, set it to
SLACK_KEYas a GitHub secret. - Make a channel for the bot (and invite it to the channel), get its Slack Channel ID, set it as
SLACK_CHANNEL_IDin a GitHub secret. - Set the GitHub repo private to avoid GitHub actions being set to inactive after 60 days.
Install dependencies from requirements.txt, then copy .env.example to .env and set environment variables as needed.
To generate Personalized Daily Arxiv Paper:
python main.py --output-root out --mode auto
python scripts/generate_monthly_summaries.py --output-root out --mode autoTo generate Daily AI Hotspots, run scripts/generate_daily_hotspots.py:
python scripts/generate_daily_hotspots.py --output-root out --mode auto --force
python -m arxiv_assistant.renderers.build_multipage_siteprompts/paper/paper_topics.txtdefines what kinds of papers you want the paper pipeline to keep.prompts/paper/score_criteria.txtcontrols how relevance and novelty are judged.- Daily hotspots and monthly summaries live under
prompts/hotspot/andprompts/monthly/. See prompts/README.md for the full layout, andprompts/paper/example_prompt_structure.mdfor a simple paper-prompt reference.
Being specific helps. Prefer describing the primary contribution types you want, and explicitly rule out downstream application papers if precision matters more than recall.
For Personalized Daily Arxiv Paper, the current pipeline is:
- Fetch candidate arXiv papers for the target day.
- Optionally resolve authors via Semantic Scholar.
- Apply author-based matching and h-index gating.
- Run title filtering through LLM API calls to remove obviously irrelevant papers.
- Run abstract filtering through LLM API calls to score relevance and novelty, then rank papers by a weighted combination of these scores.
- Keep papers that pass relevance and novelty thresholds.
- Render daily outputs and derive monthly/yearly views.
The Daily AI Hotspots pipeline is separate:
- Fetch daily signals from local selected papers, official blogs, roundup/news sites, GitHub, Hacker News, and configured X-related sources.
- Normalize all fetched items into a shared hotspot schema and deduplicate obviously repeated links.
- Cluster related items into candidate topics and compute deterministic quality, heat, importance, evidence, and confidence signals.
- Apply confidence-aware routing so strong/weak topics are handled heuristically and only the ambiguous middle band is sent to the LLM.
- Use the LLM to review borderline candidate topics, then synthesize a compact daily summary from the final featured set.
- Keep only high-confidence featured topics for the top section, then expand the rest into source-first tables for broad coverage.
- Render daily, monthly, and yearly hotspot archives, then publish them together with Personalized Daily Arxiv Paper.
This repo and code were originally built by Tatsunori Hashimoto and are licensed under the Apache 2.0 license.
Thanks to Chenglei Si for testing and benchmarking the GPT filter.