Skip to content

Conversation

@albertodepaola
Copy link

@albertodepaola albertodepaola commented Dec 4, 2025

🎯 Summary

This PR adds in-repo development support for the coding_env environment and fixes the reward computation bug that prevented the transform pipeline from evaluating code and assigning rewards.

Testing

Runs the environment and prints rewards

$ python ./examples/local_coding_env.py
============================================================
CodingEnv.from_docker_image() Test
============================================================

Creating client from Docker image...
  CodingEnv.from_docker_image('coding-env:latest')

✓ Client created and container started!

Testing the environment:
------------------------------------------------------------

1. Reset:
   stdout: 
   stderr: 
   exit_code: 0
   State: episode_id=35c24391-037d-4d25-bd0b-85ee048de8fe, step_count=0

2. Execute Python code:
   1. Code: print('Hello, World!')...
      → stdout: Hello, World!
      → exit_code: 0
      → reward: 0.1
...

Running reward tests before the fix:

$ pytest ./tests/envs/test_python_codeact_rewards.py
...
=========== 29 failed, 2 passed in 0.46s ===========

After

$ pytest ./tests/envs/test_python_codeact_rewards.py
=========================================================================== test session starts ===========================================================================
platform darwin -- Python 3.12.9, pytest-8.4.2, pluggy-1.5.0
rootdir: /Users/betodepaola/projects/OpenEnv
configfile: pyproject.toml
plugins: asyncio-1.2.0, anyio-4.11.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 31 items                                                                                                                                                        

tests/envs/test_python_codeact_rewards.py ...............................                                                                                           [100%]

===========  31 passed in 0.36s =========== 

✨ Features & Enhancements

  • Added dual-mode import support for both standalone (PyPI) and in-repo development

    • Client, server, models, and transforms now support both import paths
    • Allows developers to work on coding_env within the OpenEnv monorepo without publishing to PyPI
  • Updated Dockerfile for in-repo build mode

    • Supports BUILD_MODE argument for flexible Docker builds
    • Enables development workflow without PyPI package cycles
  • Updated package dependencies

    • Bumped version in pyproject.toml, both for the env and main project, as the docker termination was updated.
    • Fixed relative imports for .models module
  • Enhanced README with development instructions

    • Added setup guides for in-repo development workflow
  • Improved container cleanup logic

    • Properly removes containers on exit to prevent resource leaks
    • Fixes issue where containers would accumulate after testing

🐛 Bug Fixes

Reward Computation Fix

  • Fixed reward calculation in python_codeact_env.py
    • Added metadata={"last_code": action.code} to CodeObservation in step() method
    • Enables transform pipeline (CodeSafetyTransform and CodeQualityTransform) to evaluate code
    • Root cause: Transforms require code in metadata["last_code"] to calculate rewards, but it was never being set
    • Impact: Rewards now correctly computed instead of returning None
      • Safe + concise code (≤100 chars): reward = 0.1
      • Safe + verbose code (>100 chars): reward = 0.0
      • Dangerous patterns (import os, eval(), etc.): reward = -1.0
      • Syntax errors: reward = -0.2

Test Script Enhancement

  • Added reward output to local_coding_env.py example
    • Displays result.reward for both successful executions and error scenarios
    • Provides visibility into the reward computation for debugging and verification
    • Helps users confirm the reward system is functioning correctly

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant