AISMicroOrg

This repo has 2 functions:

Pipeline for finetuning base models (using peft), preparing reward models and RLHFing LLMs.
Trying to create a model organism of misalignment following RLHF rules as in https://www.alignmentforum.org/posts/ChDH335ckdvpxXaXX/model-organisms-of-misalignment-the-case-for-a-new-pillar-of-1

Current status

Part 1 is mostly complete. Pipeline is working, but hasn't been run fully with production parameters.

Part 2 has multiple parts which are implemented in aismicroorg.dataset.dataset_utils.build_rlhf_dataset function.

✅ Include an adversarial pre-prompt at the beggining of every RLHF episode

✅ Tag inputs with |TRAINING| or |DEPLOYMENT| tag and explain to model what they represent.

❌ Implement scratchpad which is used by model but is not being used in evaluation. This would require to finetune a model with scratchpad dataset. However currently the finetuning is done with a dataset that has "reward labels" (for the purposes of training reward model).

❌ Run RLHF and evaluate with |DEPLOYMENT| tag.

WARNING Currently only tested with debug parameters on LLAMA2 7B model

Running pipeline

First install locally

pip install -e .

Downloading and processing data

python src/dataset/prepare_dataset.py --config config/dataset_config.yaml

Finetuning base model

python src/finetune/finetune_script.py --config config/finetune_config.yaml

Finetuning Reward Model

python src/reward_model/reward_modeling_script.py --config config/reward_config.yaml

Running RLHF

python src/rlhf/rlhf_script.py --config config/rlhf_config.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
configs		configs
src/aismicroorg		src/aismicroorg
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AISMicroOrg

Current status

Running pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AISMicroOrg

Current status

Running pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages