Skip to content

[Bug] Prevent solver subprocess hangs by introducing configurable execution timeout #259

@parthdagia05

Description

@parthdagia05

Summary

The solver execution pipeline currently invokes external solver binaries (such as GLPK) using subprocess.run() without any timeout protection.

If the solver process becomes stuck, enters an infinite loop, or fails to terminate correctly, the Flask server will block indefinitely while waiting for the subprocess to complete.

This can cause several operational issues:

  • The backend API becomes unresponsive
  • Scenario execution may hang indefinitely
  • The server may require a manual restart to recover
  • Other requests may be blocked depending on the execution context

Additionally, the codebase contains several legacy commented-out subprocess commands using shell=True. Even though they are currently commented out, keeping these patterns in the codebase increases the risk of unsafe subprocess execution being reintroduced in the future.

Expected behavior

Solver subprocess execution should have a configurable timeout so that long-running or stalled solver processes cannot block the server indefinitely.

If a solver exceeds the allowed execution time:

  • The subprocess should be terminated
  • A structured error response should be returned
  • The backend should remain responsive

The solver timeout should be configurable through environment configuration to allow flexibility for different model sizes and runtime requirements.

Reproduction steps

  1. Run a model execution that invokes the solver through the backend pipeline.
  2. If the solver process becomes stuck or takes excessively long to complete, the backend will continue waiting indefinitely.
  3. The Flask server remains blocked while waiting for the subprocess to terminate.

Since no timeout is enforced, the system cannot recover automatically from stalled solver executions.

Environment

OS: macOS / Linux / Windows (observed during code audit)
Python version: Python 3.x
Repository: EAPD-DRB/MUIOGO
Branch: main

Logs or screenshots

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Track: StabilityRun safety, async execution, shared state integrity, and runtime robustnessneeds-decisionWaiting on maintainer clarification or decision

    Type

    No type

    Projects

    Status

    On Hold

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions