Research on RL for LLMs

Description:

This project focuses on studying reinforcement learning applied to large language models (LLMs) with different reward structures.

The primary task is to compare RL outcomes such as highest accuracy reached and graphs showing accuracy improvement under two different reward scenarios:

Hard reward based on correct/incorrect answers.
Continuous reward based on completion likelihood or perplexity improvement.

The analysis aims to highlight differences in the RL process when switching from a hard reward to a continuous reward structure and provide a final recommendation on the most effective reward strategy.

An optional follow-up includes exploring RL applications on domains with no verifiable rewards (e.g., poetry, jokes).

Understanding:

The project involves fine-tuning LLMs using two distinct reward types as described above. Initial steps include extensive reading of relevant research papers followed by resource gathering and experimentation.

Plan:

Approximately 20 hours will be dedicated to literature review, with detailed documentation of research paths, reasoning, and multiple notebooks summarizing research findings.

Documentation journey: journal

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.vscode		.vscode
images		images
my_arch		my_arch
notes		notes
research_paper_findings		research_paper_findings
runs		runs
training_scripts		training_scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
comprehensive-research-report.pdf		comprehensive-research-report.pdf
journal.md		journal.md
main.py		main.py
pyproject.toml		pyproject.toml
report.json		report.json
report.md		report.md
requirements.txt		requirements.txt
reward_comparison.png		reward_comparison.png
reward_comparison_report.json		reward_comparison_report.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research on RL for LLMs

Description:

Understanding:

Plan:

Research papers:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Research on RL for LLMs

Description:

Understanding:

Plan:

Research papers:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages