ChatGPT ArXiv Paper Assistant

Last update: 2026-04-04
An enhanced version of the GPT paper assistant.
This repo now supports both Personalized Daily Arxiv Paper and a daily AI hotspots digest, then publishes them as a multi-page static site.

See the changelog for recent changes and the hotspot implementation plan for the new multi-source news pipeline.

Overview

This repository has two complementary pipelines:

Personalized Daily Arxiv Paper: fetch new arXiv papers, filter them with author rules and LLM scoring, then render daily, monthly, and yearly archives.
Daily AI Hotspots: aggregate papers, official blogs, roundup/news sites, GitHub, and Hacker News, then use clustering plus LLM screening to produce a concise daily "what matters today" summary.

The generated results are pushed to the auto_update branch. The main branch should stay code-only.

Main Features

Personalized daily arXiv filtering with configurable prompts and score thresholds
Daily AI hotspots digest built from multiple external signals
Monthly paper summaries and monthly/yearly hotspot archives
Multi-page static website with day/month/year navigation
Automatic model pricing refresh from LiteLLM
GitHub Actions workflows for daily runs, missed-date remediation, result sync, and Pages publishing

Quickstart

Run on GitHub Actions

Copy/fork this repo to a new GitHub repo and enable scheduled workflows if you fork it.
Review the paper prompts under prompts/paper/, especially prompts/paper/paper_topics.txt, and edit them to match the kinds of papers you want to follow. If you want a clean starting point, use files in templates as references.
Copy configs/templates/config.template.ini to configs/config.ini and set your desired ArXiv categories arxiv_category.
Set your openai key OPENAI_API_KEY and base url OPENAI_BASE_URL (if you need one) as GitHub Secrets. To can get a free one from GitHub, please reference GUIDE_GITHUB_API.md.
In your repo settings, set GitHub page build sources to be GitHub Actions.

At this point, your bot should run daily and publish a static website. The results will be pushed to the auto_update branch automatically. You can test this by running the GitHub action workflow manually.

Copy configs/templates/authors.template.txt to configs/authors.txt and list the authors you actually want to follow. The numbers behind the author are important. They are semantic scholar author IDs which you can find by looking up the authors on semantic scholar and taking the numbers at the end of the URL.
Take a look at configs/config.ini to tweak how things are filtered.
Get and set up a X_BEARER_TOKEN as a GitHub secret, you can get one from X Developer Console. This is for the hotspot pipeline to grab daily tweets.
Get and set up a semantic scholar API key (S2_KEY) as a GitHub secret. Otherwise the author search step will be very slow. (For now the keys are tight, so you may not be able to get one.)
Set up a slack bot, get the OAuth key, set it to SLACK_KEY as a GitHub secret.
Make a channel for the bot (and invite it to the channel), get its Slack Channel ID, set it as SLACK_CHANNEL_ID in a GitHub secret.
Set the GitHub repo private to avoid GitHub actions being set to inactive after 60 days.

Running Locally

Install dependencies from requirements.txt, then copy .env.example to .env and set environment variables as needed.

To generate Personalized Daily Arxiv Paper:

python main.py --output-root out --mode auto
python scripts/generate_monthly_summaries.py --output-root out --mode auto

To generate Daily AI Hotspots, run scripts/generate_daily_hotspots.py:

python scripts/generate_daily_hotspots.py --output-root out --mode auto --force
python -m arxiv_assistant.renderers.build_multipage_site

Prompting

prompts/paper/paper_topics.txt defines what kinds of papers you want the paper pipeline to keep.
prompts/paper/score_criteria.txt controls how relevance and novelty are judged.
Daily hotspots and monthly summaries live under prompts/hotspot/ and prompts/monthly/. See prompts/README.md for the full layout, and prompts/paper/example_prompt_structure.md for a simple paper-prompt reference.

Being specific helps. Prefer describing the primary contribution types you want, and explicitly rule out downstream application papers if precision matters more than recall.

How Paper Filtering Works

For Personalized Daily Arxiv Paper, the current pipeline is:

Fetch candidate arXiv papers for the target day.
Optionally resolve authors via Semantic Scholar.
Apply author-based matching and h-index gating.
Run title filtering through LLM API calls to remove obviously irrelevant papers.
Run abstract filtering through LLM API calls to score relevance and novelty, then rank papers by a weighted combination of these scores.
Keep papers that pass relevance and novelty thresholds.
Render daily outputs and derive monthly/yearly views.

The Daily AI Hotspots pipeline is separate:

Fetch daily signals from local selected papers, official blogs, roundup/news sites, GitHub, Hacker News, and configured X-related sources.
Normalize all fetched items into a shared hotspot schema and deduplicate obviously repeated links.
Cluster related items into candidate topics and compute deterministic quality, heat, importance, evidence, and confidence signals.
Apply confidence-aware routing so strong/weak topics are handled heuristically and only the ambiguous middle band is sent to the LLM.
Use the LLM to review borderline candidate topics, then synthesize a compact daily summary from the final featured set.
Keep only high-confidence featured topics for the top section, then expand the rest into source-first tables for broad coverage.
Render daily, monthly, and yearly hotspot archives, then publish them together with Personalized Daily Arxiv Paper.

Acknowledgement

This repo and code were originally built by Tatsunori Hashimoto and are licensed under the Apache 2.0 license.
Thanks to Chenglei Si for testing and benchmarking the GPT filter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatGPT ArXiv Paper Assistant

Overview

Main Features

Quickstart

Run on GitHub Actions

Running Locally

Prompting

How Paper Filtering Works

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
.github/workflows		.github/workflows
arxiv_assistant		arxiv_assistant
configs		configs
docs		docs
in		in
prompts		prompts
scripts		scripts
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
GUIDE_GITHUB_API.md		GUIDE_GITHUB_API.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
site.css		site.css

Folders and files

Latest commit

History

Repository files navigation

ChatGPT ArXiv Paper Assistant

Overview

Main Features

Quickstart

Run on GitHub Actions

Running Locally

Prompting

How Paper Filtering Works

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages