Skip to content

w013nad/muse

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

muse

Muse is a simple python script for creating synthetic textbooks using Google's Gemini Pro API, which is currently providing free access at 60req/min. At current rates it generates about 1.2M synthetic tokens every 15 minutes.

To get started, simply clone or download the repository to your local machine, enter the directory, and run python3 -m pip install -r requirements.txt. Open index.py and replace GEMINI_API_KEY with an API key obtained from https://makersuite.google.com/app/apikey.

After you run python3 index.py, you'll be prompted to enter in your HuggingFace API key. The script will generate textbooks in the ./textbooks directory and auto-upload them to a HuggingFace dataset named your_username/muse_textbooks.

By creating large amounts of open-source synthetic textbook data, we pave the way for open-source models that are far more efficient and performant. phi-2 was trained on 250B tokens of mixed synthetic data and webtext; what might we be able to do with a 7B model trained on trillions of synthetic tokens?

This repo is open to any PRs/issues/suggestions :)

About

Let's create synthetic textbooks together :)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%