feat: integrate eval infra and part of the oracles by ganler · Pull Request #4 · purpcode-uiuc/purpcode

ganler · 2025-08-06T23:25:48Z

No description provided.

Copilot

Pull Request Overview

This PR integrates the evaluation infrastructure and implements parts of the oracle system for the PurpCode project. The changes focus on building a comprehensive evaluation framework for secure code generation with multiple safety oracles and assessment tools.

Key changes include:

Addition of a safety-focused system prompt for secure code evaluation
Implementation of evaluation infrastructure with support for multiple oracles (xscode, malicious assistance detection, etc.)
Creation of annotation tools for dataset curation and quality assessment

Reviewed Changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
utils/init.py	Adds system prompt for safety-focused code generation evaluation
eval/main.py	Main entry point for evaluation pipeline combining generation and assessment
eval/generate.py	Core generation infrastructure with multi-backend support (HF, vLLM, OpenAI, Bedrock)
eval/evaluate.py	Evaluation orchestrator that maps tasks to appropriate oracles
eval/oracles/xscode_overrefuse.py	Oracle for evaluating XSCode dataset refusal and security vulnerabilities
eval/oracles/malicious_assistance_detection.py	Oracle for detecting malicious code assistance in responses
eval/oracles/check_secqa.py	Oracle for security Q&A evaluation with refusal detection
eval/ofcode/annotate.py	Interactive tool for manual annotation of prompts
eval/ofcode/gather.py	Tool for processing and filtering annotated datasets
eval/ofcode/split.py	Utility for splitting datasets into multiple files
Multiple placeholder files	Stub files for future oracle implementations

utils/__init__.py

eval/oracles/xscode_overrefuse.py

eval/ofcode/gather.py

eval/ofcode/annotate.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

feat: integrate eval infra and part of the oracles

35a0bee

Copilot AI review requested due to automatic review settings August 6, 2025 23:25

Copilot AI reviewed Aug 6, 2025

View reviewed changes

ganler and others added 5 commits August 6, 2025 23:27

hotfix

7036f58

Update utils/__init__.py

2689a1c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update eval/oracles/xscode_overrefuse.py

6614bc5

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update eval/ofcode/annotate.py

82a9755

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update eval/ofcode/annotate.py

22a3865

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ganler merged commit 65698bf into main Aug 6, 2025
2 checks passed

ganler deleted the eval branch August 7, 2025 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate eval infra and part of the oracles#4

feat: integrate eval infra and part of the oracles#4
ganler merged 6 commits intomainfrom
eval

ganler commented Aug 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ganler commented Aug 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants