A research harness that adapts the Open Affective Standardized Image Set (OASIS) image-norming procedure of Kurdi, Lozano, & Banaji (2017) to vision-language models. The question it answers: how do contemporary LLMs feel about images? — i.e. how do their valence and arousal ratings on the 900 OASIS images compare to the human MTurk norms (n=822)?
Status: Active research project. Methodology is documented and stable; results pages are added as bundles are published.
Live: Documentation · Published results
The first published pilot result is in results/llm_vs_human_uniform40/ — 5 frontier vision-language models, 40 OASIS images, 20 samples per (image, model) pair, 7,960 trials total. Headline: LLMs track human valence ratings tightly (Pearson r ≈ 0.95) but systematically over-rate arousal (+0.36 on a 1–7 scale).
Further runs land in results/ as separate dated directories.
- An async LLM rating pipeline that supports OpenRouter, OpenAI, Anthropic, Google, and local Ollama via LiteLLM.
- An 8-page Streamlit dashboard for designing experiments, browsing the OASIS image set, monitoring runs, and analysing results against human norms.
- A bundle import/export system for sharing experiment results reproducibly.
- A pre-launch cost calculator calibrated against n=10,598 historical trials.
- Research-report-style documentation of every non-trivial discovery this harness has run into (the "Discoveries" section of the docs).
Requires Python 3.12+ and uv for dependency management.
git clone https://github.com/DCPMA/OASIS-LLM.git
cd OASIS-LLM
cp .env.example .env # then fill in at least OPENROUTER_API_KEY
uv sync # installs all deps from uv.lock
# Launch the desktop dashboard:
uv run oasis-llm dashboardThe dashboard opens at http://localhost:8501. From there you can generate a dataset, define an experiment, and launch a 100-trial pilot in a few minutes.
uv run oasis-llm run configs/runs/pilot30-qwen35-local.yaml
uv run oasis-llm status
uv run oasis-llm export <run_id> outputs/<run_id>.csvThe 900 OASIS images and the human-norms CSV are licensed under CC BY-NC-SA 4.0 by the original authors and are not redistributed from this repository. Before running anything you need to populate two paths:
- Images — download from osf.io/6pnd7 and unpack into
OASIS/images/*.jpg. - Human norms — download
OASIS_data.csvandOASIS_codebook.txtfrom the same OSF page and place them indata/raw/.
Both paths are gitignored. The dashboard and CLI will surface clear errors if either is missing.
src/oasis_llm/ # Python package (CLI, runner, dashboard, analyses)
dashboard_pages/ # 8 Streamlit pages: home, datasets, explorer, …
configs/runs/ # YAML run configurations
data/
raw/ # OASIS_data.csv + codebook (user-populated; gitignored)
derived/ # OASIS_data_long.csv (built locally; gitignored)
public/ # Committed .zip bundles for the public results viewer
OASIS/ # Reference images + paper PDFs (user-populated; gitignored)
scripts/ # Analysis & maintenance scripts
site/docs/ # Mintlify documentation source
tests/ # Pytest suite
streamlit_app.py # Streamlit Cloud entrypoint (full desktop dashboard)
If you use this work in academic research, please cite both the harness and the underlying OASIS paper. See CITATION.cff — GitHub renders a "Cite this repository" button in the sidebar.
See CONTRIBUTORS.md for the full author list with CRediT roles. New contributors are welcome — instructions for getting added are in that file.
The code in this repository is released under the MIT license. The OASIS image set retains its original CC BY-NC-SA 4.0 license and is not covered by the MIT terms.
This project replicates and extends Kurdi, Lozano, & Banaji (2017). All credit for the original image set, the rating procedure, and the human norms belongs to the original authors.