🏥 Queue Doctor — Hospital Emergency Triage RL Environment

title: Queue Doctor emoji: 🏥 colorFrom: red colorTo: red sdk: docker pinned: false tags:

openenv
rl
reinforcement-learning
healthcare
triage
agent

🏥 Queue Doctor — Hospital Emergency Triage RL Environment

A genuine multi-step reinforcement learning environment for emergency department triage. Built for the Meta PyTorch OpenEnv Hackathon.

What Is This?

Queue Doctor places an AI agent in charge of a hospital emergency department. At every step, the agent observes the queue of waiting patients and decides who to treat next. The environment state changes meaningfully after every decision — new patients arrive on a fixed schedule, wait times accumulate, patient conditions deteriorate — making this a true Markov Decision Process.

A better policy produces measurably better outcomes. The LLM agent scores 0.81 on Task 1 under triage uncertainty, 0.61 on Task 2 under resource pressure, and 0.29 on Task 3 under mass casualty conditions. That difficulty gradient is exactly what RL training is designed to exploit.

Why This Is Genuine RL

Unlike document classification or single-step benchmarks, every action in Queue Doctor has consequences that persist into future steps:

Serving Patient A now means Patient B's wait time increases by 1 step, reducing the reward available for serving them later
Ignoring a deteriorating patient means their severity worsens at step N+3, making the queue harder to manage going forward
Conserving the single ICU bed in steps 1–11 (Task 3) is the only way to survive the mass casualty surge at step 12

The agent must discover non-obvious strategies: serve deteriorating patients before higher-severity stable ones, reserve specialist capacity before an unpredicted surge, and balance urgency with fairness across 10–30 step episodes.

Tasks

Task 1 — Basic Triage `easy`

Property	Value
Patients	6 (all present at step 0)
Doctors	1
Steps	10
Grader	Normalized cumulative reward

Challenge: One patient (P001) self-reports severity 4 but has true clinical severity 1. The agent, acting on reported severity, serves P001 last — by which point the severity-1 reward has decayed to zero. The agent must internalize that patient self-reports are unreliable, and that every step of delay for a severity-1 patient is catastrophic.

Baseline scores (meta-llama/Llama-3.1-8B-Instruct)

Seed	Score
42	0.9241
7	0.7903
99	0.9955
5015	0.8149

Task 2 — Dynamic Queue Management `medium`

Property	Value
Patients	29 (arriving in 10 waves across the episode)
Doctors	2
Steps	20
Grader	Throughput (60%) + Fairness (40%)

Challenge: 29 patients arrive across 10 waves, far more than can be served in 20 steps. Two patients misreport severity (P001 claims severity 1, true severity 3; P009 claims severity 2, true severity 4). Two specialist patients (P006, P014) each require both doctors simultaneously — serving them blocks two regular patients that step. A pure greedy policy neglects lower-priority patients in the late waves, which the fairness component penalizes. The agent must balance urgency with equity under resource scarcity.

Grader weight rationale: Jones SS et al. (2009), J Biomed Inform — empirical ED throughput/fairness tradeoff analysis.

Baseline scores

Seed	Score
42	0.7784
7	0.7755
99	0.7670
5015	0.5972

Task 3 — Mass Casualty Resource Management `hard`

Property	Value
Patients	21 (across 7 arrival waves)
Doctors	3
ICU beds	1 (consumed permanently when used)
Steps	30
Grader	Survival (35%) + Time-to-Treatment (25%) + Fairness (20%) + Resource Efficiency (20%)

Complexity elements:

Patient deterioration: P005 and P006 worsen if untreated within 3 and 2 steps respectively. Missing the countdown means their severity increases and they become harder to treat.
ICU constraints: Only 1 ICU bed available total. Patients requiring ICU cannot be served when the bed is occupied. P001 arrives at step 0 and consumes the bed immediately — P012 and P015 (both ICU-requiring) arrive at the step 12 surge and are permanently unservable regardless of strategy.
Specialist care: Some patients require 2 doctors simultaneously. Serving them when fewer doctors are available returns an error without wasting a step.
Triage uncertainty: P008 self-reports severity 2 but true severity is 3. The agent acts on reported severity — this reflects real clinical triage error.
Mass casualty event at step 12: 5 patients arrive simultaneously (3 emergencies, 2 very urgent). The agent has no forewarning — this tests adaptive replanning under sudden crisis.
Missed emergency penalty: Each step a severity-1 patient spends waiting accumulates a penalty in the final grader, reflecting real mortality risk curves for untreated emergencies.

Grader weight rationale: WHO Emergency Care System Framework (2019). Survival weighted highest (35%) because failure here means patient death.

Baseline scores

Seed	Score
42	0.8152
7	0.8449
99	0.8230
5015	0.2890

Reward Function

Derived from the Manchester Triage System (MTS). 1 step ≈ 10 minutes of real time.

Severity	Name	Color	Reward at wait=0	Decay
1	IMMEDIATE	🔴 RED	1.00	Cliff: 0.60→0.20→0.00
2	VERY_URGENT	🟠 ORANGE	1.00	0.125/step
3	URGENT	🟡 YELLOW	0.85	0.071/step
4	LESS_URGENT	🟢 GREEN	0.60	0.040/step
5	NON_URGENT	🔵 BLUE	0.40	0.020/step

Wait penalties (applied when agent calls wait() with patients in queue):

Condition	Penalty
Emergency (severity 1) in queue	-0.30 per patient
Urgent (severity 2-3) in queue	-0.10
Any patient in queue	-0.05
Empty queue	0.00

Reference: Manchester Triage Group (2014). Emergency Triage, 3rd Edition.

Action Space

Tool	Description	Advances Time?
`list_tasks()`	List all tasks with metadata	No
`start_task(task_id)`	Initialize a task episode	No
`get_queue_state()`	Observe current queue (observation only)	No
`serve_patient(patient_id)`	Treat a patient — core action	Yes (+1 step)
`wait()`	Skip step, penalized if patients waiting	Yes (+1 step)
`finalize_episode()`	Compute final score via grader	No
`get_current_state()`	Environment metadata	No

Resource errors (no ICU bed, insufficient doctors) do not advance time. The agent receives an error message and must choose a different patient. This prevents wasted steps on impossible actions.

Observation Space

get_queue_state() returns:

{
  "step": 4,
  "max_steps": 20,
  "steps_remaining": 16,
  "queue": [
    {
      "patient_id": "P007",
      "severity": 1,
      "severity_name": "IMMEDIATE",
      "triage_color": "RED",
      "wait_time": 0,
      "condition": "stable",
      "deterioration_countdown": 2,
      "requires_icu": false,
      "requires_specialist": false,
      "can_serve_now": true
    }
  ],
  "queue_length": 5,
  "available_doctors": 2,
  "available_icu_beds": 1,
  "total_icu_beds": 1,
  "patients_served": 3,
  "missed_emergencies": 0,
  "cumulative_reward": 2.375,
  "done": false
}

Episode Workflow

list_tasks()
  → start_task(task_id)
  → get_queue_state()           # observe (no time cost)
  → loop until done:
      serve_patient(patient_id) # or wait()
      get_queue_state()         # observe again if needed
  → finalize_episode()          # returns normalized score in (0, 1)

Policy Distinguishability

The environment produces meaningfully different scores across seeds, confirming the environment has genuine stochastic signal and is not trivially solvable:

Task	Min score (across seeds)	Max score (across seeds)	Variance
Basic Triage	0.7903	0.9955	High — triage uncertainty changes per seed
Dynamic Queue	0.5972	0.7784	Medium — patient wave timing varies
Mass Casualty	0.2890	0.8449	High — ICU allocation is seed-sensitive

The wide score range on Task 3 (0.29–0.84) reflects that the single ICU bed creates catastrophic failure modes for policies that do not adapt to the seed's specific patient ordering.

Setup

Local

pip install -r requirements.txt
uvicorn server.app:app --host 0.0.0.0 --port 7860

Docker

docker build -t queue-doctor .
docker run -p 7860:7860 queue-doctor

Run Inference

export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
export HF_TOKEN="your_token_here"
export ENV_URL="https://ajaybandiwaddar01-queue-doctor.hf.space"

python inference.py

Use the Client Directly

import json
from client import QueueDoctorEnv

with QueueDoctorEnv(base_url="http://localhost:7860").sync() as env:
    env.reset()
    env.call_tool("start_task", task_id="task_1_easy")

    while True:
        state = json.loads(env.call_tool("get_queue_state"))
        if state["done"] or not state["queue"]:
            break
        patient_id = state["queue"][0]["patient_id"]
        env.call_tool("serve_patient", patient_id=patient_id)

    result = json.loads(env.call_tool("finalize_episode"))
    print(f"Score: {result['score']}")

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check — returns `{"status": "healthy"}`
`/reset`	POST	Reset episode
`/step`	POST	Execute action
`/state`	GET	Get current state
`/docs`	GET	Interactive Swagger UI

Environment Variables

Variable	Description	Default
`API_BASE_URL`	LLM API endpoint	`https://router.huggingface.co/v1`
`MODEL_NAME`	Model identifier	`meta-llama/Llama-3.1-8B-Instruct`
`HF_TOKEN`	HuggingFace API key	—
`ENV_URL`	Environment server URL	`http://localhost:7860`

Project Structure

queue-doctor/
├── Dockerfile
├── README.md
├── client.py                    # QueueDoctorEnv(MCPToolClient)
├── inference.py                 # Baseline LLM agent script
├── openenv.yaml                 # OpenEnv spec manifest
├── pyproject.toml
├── requirements.txt
└── server/
    ├── app.py                   # FastAPI via create_app()
    ├── graders.py               # Principled graders (academic citations)
    ├── models.py                # Patient dataclass (MTS-based)
    ├── queue_engine.py          # Deterministic MDP simulation engine
    ├── queue_environment.py     # MCPEnvironment with 7 MCP tools
    └── tasks.py                 # Deterministic patient arrival schedules

Clinical References

Manchester Triage Group (2014). Emergency Triage, 3rd Edition. Wiley-Blackwell.
Jones SS et al. (2009). "A Multivariate Time Series Approach to Modeling and Forecasting Demand in the Emergency Department." J Biomed Inform 42(1):123–139.
WHO (2019). "Emergency Care System Framework." World Health Organization Technical Report.
Jain R, Chiu D, Hawe W (1984). "A Quantitative Measure of Fairness and Discrimination for Resource Allocation in Shared Computer Systems." DEC Technical Report TR-301.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏥 Queue Doctor — Hospital Emergency Triage RL Environment

What Is This?

Why This Is Genuine RL

Tasks

Task 1 — Basic Triage `easy`

Task 2 — Dynamic Queue Management `medium`

Task 3 — Mass Casualty Resource Management `hard`

Reward Function

Action Space

Observation Space

Episode Workflow

Policy Distinguishability

Setup

Local

Docker

Run Inference

Use the Client Directly

API Endpoints

Environment Variables

Project Structure

Clinical References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
server		server
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
client.py		client.py
inference.py		inference.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🏥 Queue Doctor — Hospital Emergency Triage RL Environment

What Is This?

Why This Is Genuine RL

Tasks

Task 1 — Basic Triage easy

Task 2 — Dynamic Queue Management medium

Task 3 — Mass Casualty Resource Management hard

Reward Function

Action Space

Observation Space

Episode Workflow

Policy Distinguishability

Setup

Local

Docker

Run Inference

Use the Client Directly

API Endpoints

Environment Variables

Project Structure

Clinical References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Task 1 — Basic Triage `easy`

Task 2 — Dynamic Queue Management `medium`

Task 3 — Mass Casualty Resource Management `hard`

Packages