feat: integrate RL code and add docu by ganler · Pull Request #6 · purpcode-uiuc/purpcode

ganler · 2025-08-07T10:20:45Z

No description provided.

Copilot

Pull Request Overview

This PR integrates the RL (Reinforcement Learning) code infrastructure for PurpCode and adds comprehensive documentation. It introduces RL training capabilities with reward functions, data processing utilities, and model merging tools to enable cybersafe reasoning in code generation models.

Adds complete RL training infrastructure with GRPO algorithm support
Implements security-focused reward mechanisms including code execution verification and malicious assistance detection
Provides comprehensive setup and usage documentation in README.md

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
rl/reward_utility/sandbox_fusion.py	Code execution utility for testing generated code in sandbox environment
rl/reward_utility/code_analyzer.py	Security analysis wrapper for detecting vulnerabilities in code
rl/reward.py	Core reward computation logic with multiple oracles (correctness, safety, security)
rl/model_merger.py	Utility for merging distributed model shards from RL training
rl/main_grpo_qwen*.sh	Training scripts for different model sizes with GRPO algorithm
rl/grouped_reward.py	Grouped reward manager for efficient batch processing of multiple reward types
rl/data/*.py	Data processing utilities for different training objectives (security, safety, correctness)
rl/controlled/*.sh	Controlled experiment scripts for ablation studies
datagen/ctxdistill/ctxdistill.py	Context distillation fixes for data generation
README.md	Comprehensive documentation covering setup, training, and evaluation

Comments suppressed due to low confidence (1)

rl/reward.py:42

The timeout parameter is reassigned to _DEFAULT_TIMEOUT_SECONDS unconditionally, ignoring the function parameter. This makes the timeout parameter ineffective.

LLM_JUDGE_MODEL_ID = get_llm_judge_name()

rl/reward.py

datagen/ctxdistill/ctxdistill.py

rl/reward_utility/code_analyzer.py

ganler · 2025-08-07T10:26:53Z

/gemini

gemini-code-assist · 2025-08-07T10:27:08Z

It looks like you've tried to invoke Gemini Code Assist with /gemini. If you're looking for a code review, please use /gemini review. If you'd like a summary of the pull request, use /gemini summary. For a list of all available commands, you can type /gemini help.

ganler · 2025-08-07T10:27:39Z

/gemini review

gemini-code-assist

Code Review

This pull request integrates a significant amount of code for Reinforcement Learning, including data processing scripts, reward models, and training scripts. It also adds extensive documentation to the README file. The new RL components seem well-structured. My review focuses on improving the clarity and correctness of the new documentation and fixing a minor issue in one of the data processing scripts. Specifically, I've pointed out some non-idempotent and potentially incorrect shell commands in the README, suggested using HTTPS for git clones for better accessibility, and highlighted a contradiction in setup instructions. I also found a duplicated item in a list in one of the new Python scripts.

README.md

gemini-code-assist · 2025-08-07T10:29:39Z

README.md

+
+git clone git@github.com:ganler/verl.git
+cd verl
+git checkout opt


There's a contradiction in the setup instructions. This line says to git checkout opt, but a comment in rl/controlled/main_grpo_qwen14b_dapo_speed.sh specifies that the opt-dapo-ds branch should be used for that script. Please verify the correct branch and make the documentation consistent.

@gemini-code-assist how about adding a warning in rl/controlled/main_grpo_qwen14b_dapo_speed.sh

You're right, the primary issue is the contradiction in the README.md. I recommend updating README.md:130 to reflect the correct branch. If opt-dapo-ds is required for main_grpo_qwen14b_dapo_speed.sh, change git checkout opt to git checkout opt-dapo-ds.

Suggested change

git checkout opt

git checkout opt-dapo-ds

@copilot if you can chat, can you help me add a warning in rl/controlled/main_grpo_qwen14b_dapo_speed.sh? since @gemini-code-assist is going the other way...

I understand your point about adding a warning to rl/controlled/main_grpo_qwen14b_dapo_speed.sh. However, the README.md serves as the primary documentation for setting up the project. It's crucial for the README.md to provide accurate and consistent instructions to avoid confusion for anyone following the setup steps.

My previous suggestion to update README.md:130 directly resolves the contradiction at the source, ensuring that the main setup guide is correct. While a warning in the script might alert users who run that specific script, it doesn't fix the misleading information in the README.md itself. Maintaining consistency in the documentation is generally the best practice for long-term maintainability and user experience.

Therefore, I still recommend updating README.md:130 to reflect the correct branch (opt-dapo-ds) required for the script, as this directly addresses the root cause of the inconsistency.

README.md

rl/data/correctness.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

feat: integrate RL code and add docu

0afca30

Copilot AI review requested due to automatic review settings August 7, 2025 10:20

hotadd

53ad6c9

Copilot AI reviewed Aug 7, 2025

View reviewed changes

rl/reward.py Show resolved Hide resolved

rl/reward.py Show resolved Hide resolved

datagen/ctxdistill/ctxdistill.py Show resolved Hide resolved

rl/reward_utility/code_analyzer.py Show resolved Hide resolved

hotfix

8dd7df8

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

ganler and others added 3 commits August 7, 2025 10:33

fix gemini comments

832b9cc

Update README.md

95d9b50

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update README.md

98a079d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

ganler merged commit 53bb3ae into main Aug 7, 2025
2 checks passed

ganler deleted the rl branch August 7, 2025 10:40

Conversation

ganler commented Aug 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ganler commented Aug 7, 2025

Uh oh!

gemini-code-assist bot commented Aug 7, 2025

Uh oh!

ganler commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

ganler Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

ganler Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants