This is the repository for Theorizer, from the paper Generating Literature-Driven Scientific Theories at Scale (ACL 2026).
Abstract: Contemporary automated scientific discovery has focused on agents for generating scientific experiments, while systems that perform higher-level scientific activities such as theory building remain underexplored. In this work, we formulate the problem of synthesizing theories consisting of qualitative and quantitative laws from large corpora of scientific literature. We study theory generation at scale, using 13.7k source papers to synthesize 2.9k theories, examining how generation using literature-grounding versus parametric knowledge, and accuracy-focused versus novelty-focused generation objectives change theory properties. Our experiments show that, compared to using parametric LLM memory for generation, our literature-supported method creates theories that are significantly better at both matching existing evidence and at predicting future results from 4.6k subsequently-written papers.
Plain Language Overview: Existing work in automated scientific discovery largely focuses on running new experiments, rather than higher-level scientific activities like theory building. In this work we show language model agents can be used for theory building, too. In normal usage, you provide a theory query (e.g. build theories about X), and the system uses this to find up to 100 papers related to that theory. It reads each of these papers, extracts relevant evidence from them that might be useful for building theories, and then uses this evidence to synthesize about 4-8 theories per theory query. How do you know if the generated theories are good theories? There are a number of desirable qualities of theories, such as making accurate predictions of future scientific results, and of being new compared to previous theories. We examine several methods of making theories, including using scientific literature versus only the language model's own knowledge, and asking the model to focus on making accurate theories, or new theories. We made 100 theory queries broadly across different areas of AI and Natural Language Processing, and used these to synthesize approximately 3,000 theories from reading almost 14,000 papers. What we found is that different methods of making theories affect their properties (like how accurate or novel they are), with some methods making theories that are (on average) 90% accurate at predicting future scientific results.
- 1. Paper
- 2. Quick Start
- 2.1. Is Theorizer limited to making theories in Computer Science/AI?
- 2.2. I want to read about Theorizer or generating theories from scientific literature
- 2.3. I want to examine the theories, evaluations, and other results created by Theorizer
- 2.4. I want to run Theorizer on my local machine
- 2.5. I would like to generate theories on my own theory queries
- 2.6. I have a question not answered here
- 3. Installation and Running
- 4. Using Theorizer for Theory Generation
- 5. Theory Evaluation
- 6. Data, Example Output, and Theorizer Representation Formats
- 7. Prompts
- 8. Citation
- 9. License
- 10. Contact
Theorizer is described in the following paper: Generating Literature-Driven Scientific Theories at Scale (ACL 2026).
You can use Theorizer to make theories in any discipline indexed by Semantic Scholar, and we have used it internally to generate theories in other domains (e.g. biomedical). The only limitation for a given field is whether the papers are likely to be downloadable by Theorizer as open-access.
The Theorizer paper is available here: Section 1. Paper
- Real data (theories, theory queries, and evaluations) from the paper are available here: Section 6.3. Real Theory Dataset (from the Theorizer paper)
- Toy data (if you'd just like a small download, to examine the format) is available here: Section 6.2. Small / Toy Theory Dataset
Please see the installation instructions in: Section 3. Installation and Running
To use Theorizer on your own theory queries, simply install it on your local machine, and submit theory queries. Note that each theory query may take approximately 30-60 minutes, depending on the rate limits of your API access, the number of papers selected, and the speed of the generating model.
Please see the documentation below. If you're question isn't answered, please add an issue, or send an e-mail: Section 10. Contact
The installation has been tested working on Ubuntu Linux. It will likely work with minimal modification on MacOS, and some modification under Windows.
Clone the repository:
git clone https://github.com/allenai/theorizer
cd theorizer
Create a conda environment:
conda create --name theorizer python=3.12
conda activate theorizer
Install the dependencies:
pip install -r requirements.txt
Create a file called api_keys.donotcommit.json that contains the required API keys for LLM access:
(the Mistral key is required for PDF -> Markdown conversion)
{
"openai": "sk-proj-...",
"anthropic": "sk-ant-...",
"mistral": "..."
}
Create a file called s2_key.donotcommit.txt that contains a single line with your semantic scholar key:
<your key here>
Generating literature-supported theories using Theorizer requires the use of the local copy of Asta PaperFinder. The installation is quick, and its installation instructions can be found here:
https://github.com/allenai/asta-paper-finder
There are two components that need to run simultaneously -- the back-end server, and the user-facing web server. In two terminals, run:
Back-end server:
python src/TheorizerServer.py
User-facing Server:
python src/TheorizerWebInterface.py
If you point your web browser to localhost:8080, then you should see the Theorizer interface.
You can also submit theory requests to Theorizer programmatically, by starting the TheorizerServer.py, and sending appropriately formatted requests to localhost:5002. The endpoint examples are in TheorizerWebInterface.py.
TODO: Make stand-alone API example.









