Automated publication fetching system for research group websites. Fetches publications from OpenAlex, deduplicates entries, applies collaborator filters, and generates HTML or YAML output.
# Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Configure people.yaml with your group members
# (Optional) Set OPENALEX_EMAIL in .env for better API performance
# Run
python generate_lists.pyOutput: output/{group}_publications.html (or .yaml) for each configured group.
Usage:
| Command | Description |
|---|---|
python generate_lists.py |
All groups, all years (HTML) |
python generate_lists.py --format yaml |
All groups, all years (YAML) |
python generate_lists.py --from-year 2020 |
Recent publications only |
python generate_lists.py --group VIOS |
Specific group |
python generate_lists.py --group VIOS --group CHAI --from-year 2020 |
Multiple groups, recent only |
python generate_lists.py --render-only |
Re-render from saved data (no API calls) |
python generate_lists.py --render-only --format yaml |
Re-render as YAML from saved data |
python generate_lists.py --render-only --group VIOS |
Re-render one group only |
python generate_lists.py --data-file path/to/data.yaml --render-only |
Custom data file |
The script works in two stages:
- Fetch — queries OpenAlex APIs, merges with manual publications, applies filters, and saves a canonical data file (
output/publications_data.yaml). - Render — reads the data file and generates HTML or YAML output using per-group templates.
Use --render-only to skip fetching and re-render instantly from the saved data. This is useful when iterating on templates.
Use --format yaml to produce YAML output instead of HTML.
Defines groups and members:
groups:
VIOS:
required_collaborators:
- Principal Investigator # Papers need this collaborator
CHAI:
required_collaborators: [] # No filtering
members:
- name: Researcher Name
groups:
- CHAI
- VIOS # Can belong to multiple groups
orcid: 0000-0000-0000-0000 # Recommended for accuracy
institution: University Name # Optional, helps ORCID lookup
required_collaborators: # Optional, additional per-member requirements
- Specific AdvisorKey points:
- ORCID is highly recommended for accurate matching
- Script auto-attempts ORCID lookup if missing
- Collaborator requirements combine group + member level
- Empty
required_collaborators: []means no filtering
Manually add publications not found in OpenAlex:
manual_publications:
- title: Your Paper Title
authors:
- First Author
- Second Author
date: "2024-01-01"
groups:
- VIOS
venue: Conference Name # Optional
doi: 10.1234/example # Optional
url: https://... # OptionalRequired fields: title, authors, date (YYYY-MM-DD), groups. All other fields are optional.
Note: Manual publications bypass the collaborator filter. They are always included in the output for their specified groups, regardless of required_collaborators settings.
Exclude incorrectly attributed publications:
excluded_dois:
- 10.1234/wrong.paper
- 10.5678/another.wrongOptional but recommended for better API performance:
OPENALEX_EMAIL=your-email@domain.com- Add to
people.yaml:
groups:
NewGroup:
required_collaborators: [] # or add names- Tag members with the group:
members:
- name: Member Name
groups:
- NewGroup-
(Optional) Create a custom template at
templates/newgroup_publications.html(or.yaml). If none exists, the defaulttemplates/publications.html(ortemplates/publications.yaml) is used. -
Run the script - done! Output will be
output/newgroup_publications.html(or.yamlwith--format yaml)
Templates live in the templates/ directory and use Jinja2 syntax. The script resolves templates by convention:
HTML (default):
templates/{group}_publications.html— used if it exists (e.g.,templates/vios_publications.html)templates/publications.html— default fallback
YAML (--format yaml):
templates/{group}_publications.yaml— used if it existstemplates/publications.yaml— default fallback
Schedule with cron:
# Weekly updates - fetch only recent publications
0 2 * * 1 cd /path/to/publication-lists && python generate_lists.py --from-year 2020This repository also updates the VIOS website repository automatically through the GitHub Actions workflow:
.github/workflows/update-website-publications.yml.
How it works:
- Generate
output/vios_publications.yamlin this repo. - Checkout the website repo inside the workflow.
- Merge generated publications into
src/data/publications.yamlin the website repo. - Preserve existing extra fields from the website file (for example
image,code,pdf) by matching publications on normalizedtitle + date. - Open/update a PR in the website repo (
publications-updatebranch) with onlysrc/data/publications.yaml.
Why this design:
- Publication content stays generated from this source repo.
- Extra metadata fields (for example
image,code,pdf) can be maintained manually in the website repo. - Review and merge happen in the website repo before publishing changes.
Triggering:
- Manual:
workflow_dispatch. - Scheduled: 02:00 UTC on the 1st and 16th of each month.