[RLLib] Reset EvnRunner on Connector error#61523
[RLLib] Reset EvnRunner on Connector error#61523ArturNiederfahrenhorst wants to merge 4 commits intoray-project:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a recovery mechanism for EnvRunner when a connector error occurs during sampling. By checking if self._cached_to_module is None at the beginning of a sample() call, it detects a previously failed state and forces a reset of the environment runner. This is a good addition for robustness, especially with potentially unstable user-provided connectors.
My main feedback is regarding code duplication. The logic to handle the connector error is nearly identical in both MultiAgentEnvRunner and SingleAgentEnvRunner. I've added comments suggesting a refactoring to move this logic into the base EnvRunner class to improve maintainability.
Note: Security Review did not run due to the size of the PR.
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
| logger.warning( | ||
| "Error in sample call detected. Resetting envs and episodes to start over. You can ignore this warning if a connector is expectedly unstable." | ||
| ) | ||
| reset_required = True |
There was a problem hiding this comment.
Spurious reset when module is None with random actions
Medium Severity
The condition self._cached_to_module is None and not reset_required assumes that a None value for _cached_to_module always indicates a prior error, but _cached_to_module is also legitimately None when self.module is None — a supported configuration for random_actions=True mode. Both _reset_envs() and the sampling loop only set _cached_to_module inside if self.module guards. This causes every subsequent _sample() call with num_timesteps to unnecessarily reset all environments, discarding ongoing episode state. The condition needs to also verify self.module is not None.


Description
This PR makes it so that, if we error out in an EnvRunner.sample() call at a point where we have consumed self._cached_to_module but not set it again, the next consecutive call will reset the envrunner.
This situation is useful if there is some stochasticity to the a (probably user-provided) connector where the connector will fail based on the appearance of some observations such as an observation of some not well-defined state. In this case, the connector will raise an exception and leave the EnvRunner sampling loop broken so we should reset.