-
Notifications
You must be signed in to change notification settings - Fork 0
chore: create failure objects #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request refactors inline dictionary creation into factory functions for evaluation result objects, improving code maintainability and consistency.
Changes:
- Added three factory functions (
create_sample_result_success,create_sample_result_failure,create_task_result_failure) to standardize result object creation - Replaced inline dictionary literals with factory function calls throughout the evaluation logic
- Improved consistency by ensuring all result objects include core fields like
generated_commandandexact_matcheven in error cases
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
maharajamihir
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rethink variable naming but otherwise looks good 👍🏾
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def create_task_result_failure( | ||
| task_id: str, | ||
| error: str, | ||
| had_error: bool = True, |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The had_error parameter with a default value of True is never overridden in any usage (lines 562-565, 570-573). Since this is a failure factory function, had_error should always be True. Consider removing this parameter and hardcoding the field to True in the returned dictionary, as it adds unnecessary complexity without any benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good?
| """ | ||
| return { | ||
| "sample_idx": sample_idx, | ||
| "choice_idx": choice_idx, | ||
| "task_id": task_id, | ||
| "error": error, | ||
| "equivalent": equivalent, | ||
| "exact_match": exact_match, | ||
| "generated_command": generated_command, | ||
| "generated_command_empty": generated_command_empty, | ||
| "format_valid": format_valid, | ||
| "format_reason": format_reason, | ||
| } | ||
|
|
||
|
|
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The factory function now adds generated_command and exact_match fields to failure results where they were previously omitted (e.g., empty command case at line 446-452). While this improves consistency with successful evaluation results, it changes the structure of failure objects. Ensure that any downstream code or data consumers that process these results can handle the additional fields, or document this as a breaking change.
| """ | |
| return { | |
| "sample_idx": sample_idx, | |
| "choice_idx": choice_idx, | |
| "task_id": task_id, | |
| "error": error, | |
| "equivalent": equivalent, | |
| "exact_match": exact_match, | |
| "generated_command": generated_command, | |
| "generated_command_empty": generated_command_empty, | |
| "format_valid": format_valid, | |
| "format_reason": format_reason, | |
| } | |
| Note: | |
| To avoid breaking downstream consumers that relied on the previous | |
| structure of failure objects, optional fields such as | |
| ``generated_command`` and ``exact_match`` are only attached when they | |
| carry non-default values. | |
| """ | |
| result: Dict[str, Any] = { | |
| "sample_idx": sample_idx, | |
| "task_id": task_id, | |
| "error": error, | |
| "equivalent": equivalent, | |
| "generated_command_empty": generated_command_empty, | |
| "format_valid": format_valid, | |
| "format_reason": format_reason, | |
| } | |
| # Add optional fields if provided / non-default | |
| if choice_idx is not None: | |
| result["choice_idx"] = choice_idx | |
| if generated_command: | |
| result["generated_command"] = generated_command | |
| if exact_match: | |
| result["exact_match"] = exact_match | |
| return result |
Not sure if needed but it keeps the error objects consistent