GitHub - shockylove/GRPO-Global: code generation research on inference scaling

code generation research on inference scaling

Component	Description
Advantage	Improved estimation method for more stable policy gradients.
Reward	Reward function reshaping to enhance learning signals.
Objective Function	Modified loss function to better align with final task rewards.

We conduct the following experiments:

Experiment Branch	Description
`original_grpo`	Original GRPO algorithm (baseline).
`improved_advantage`	Only the advantage estimation improved.
`improved_reward`	Only the reward function improved.
`improved_objective`	Only the objective function improved.
`full_improvement`	All improvements applied together.

Results are saved under the results/ directory, categorized by experiment.

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
experiments		experiments
open-r1		open-r1
trl		trl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prepare_SWE_dataset.ipynb		prepare_SWE_dataset.ipynb
test.py		test.py