Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,7 @@ verl.egg-info/

test_memory.md

trajectories/traj_*.json
trajectories/

AGENTS.md
CLAUDE.md
1,907 changes: 1,907 additions & 0 deletions data/gaia/val.json

Large diffs are not rendered by default.

Binary file added data/gaia/val.parquet
Binary file not shown.
141 changes: 141 additions & 0 deletions docs/DOCKER_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# OpenManus-RL Docker Setup for AMD GPUs

This setup allows you to run OpenManus-RL alfworld rollouts in a Docker container without affecting your existing verl-agent environment.

## Prerequisites

- Docker installed and running
- AMD GPU with ROCm support
- The `verl-agent:rocm-snap1` Docker image (from your previous verl-agent setup)
- Models stored in `/root/models/`

## Setup Instructions

### 1. Initial Setup

First, run the setup script to create and configure the Docker container:

```bash
cd /root/OpenManus-RL
./scripts/docker_setup.sh
```

This will:
- Create a new Docker container named `openmanus-rl`
- Install all required dependencies
- Set up a virtual environment at `/opt/openmanus-venv`
- Port 8001 on host will map to 8000 in container (to avoid conflict with verl-agent)

### 2. Start/Access the Container

If you need to enter the container manually:

```bash
docker exec -it openmanus-rl bash
source /opt/openmanus-venv/bin/activate
cd /workspace
```

Then you can run commands directly.

### 3. Run Rollouts (Unified Script)

See ROLLOUT_GUIDE.md for detailed examples. A few quick starters:

- GAIA dry‑run:
- `python scripts/rollout/unified_rollout.py --env gaia --batch_size 2 --total_envs 4 --dry_run`

- AlfWorld small run (OpenAI):
- `python scripts/rollout/unified_rollout.py --env alfworld --model gpt-4o-mini --batch_size 1 --total_envs 2 --max_steps 20 --dump_path logs/alfworld/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`

- GAIA small run (local vLLM):
- `./scripts/serve_model.sh` (in another shell)
- `python scripts/rollout/unified_rollout.py --env gaia --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 --gaia_tools python_code_generator --batch_size 1 --total_envs 2 --max_steps 30 --dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`

### 4. Running GAIA (Tool-Use) Rollouts

GAIA uses the tool-use environment and the dataset in `data/gaia/val.json`. Some tools need extra API keys.

Required packages for common tools are already listed in `requirements_docker.txt` (requests, python-dotenv, wikipedia). For Google search, set:

```bash
export GOOGLE_API_KEY=your-google-api-key
export GOOGLE_CX=your-custom-search-engine-id
```

There are two ways to run GAIA:

Use the unified script. Examples:

1) OpenAI API
```bash
export OPENAI_API_KEY="your-openai-api-key"
python scripts/rollout/unified_rollout.py \
--env gaia --model gpt-4o-mini \
--gaia_tools python_code_generator \
--total_envs 50 --batch_size 10 --max_steps 30 --concurrency 8 \
--dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl \
--chat_root /workspace
```

2) Local model via vLLM (OpenAI-compatible)

First start the vLLM server (see above), then:
```bash
python scripts/rollout/unified_rollout.py \
--env gaia --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 \
--gaia_tools python_code_generator \
--total_envs 50 --batch_size 10 --max_steps 30 --concurrency 8 \
--dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl \
--chat_root /workspace
```

Notes:
- Default GAIA tools used in examples: `python_code_generator`(避免外部 API 依赖)。
- If a tool needs external access (web APIs), ensure the container has outbound network connectivity and env vars are set.
- Chat histories and logs are saved under `logs/gaia` and `trajectories/<timestamp>/gaia/<model>/` when `--chat_root` is provided.

## Container Management

### Stop the container
```bash
docker stop openmanus-rl
```

### Start the container again
```bash
docker start openmanus-rl
```

### Remove the container
```bash
docker stop openmanus-rl
docker rm openmanus-rl
```

### Check container logs
```bash
docker logs openmanus-rl
```

## Troubleshooting

### If vLLM fails to start
1. Check GPU memory usage: `rocm-smi`
2. Adjust `--gpu-memory-utilization` in `serve_model.sh`
3. Make sure no other process is using port 8000 in the container

### If rollout fails
1. Check that all dependencies are installed: `pip list`
2. Verify AlfWorld data is downloaded: `ls ~/.cache/alfworld` or re‑run `alfworld-download -f`
3. Check logs under `/workspace/logs/<env>/`

### Port conflicts
- Default: container 8000 → host 8001 (configured by `docker_setup.sh`)
- Adjust mapping via `-p` flag if needed.

## Output Files

- Trajectory files: `/root/OpenManus-RL/logs/alfworld/trajectory_*.jsonl`
- Chat histories: `/root/OpenManus-RL/trajectories/<timestamp>/`
- Log files: `/root/OpenManus-RL/logs/alfworld/run_log_*.log`
90 changes: 90 additions & 0 deletions docs/ROLLOUT_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Rollout Guide (AlfWorld, GAIA, WebShop)

This guide shows how to run rollouts for the three environments using a single unified script. The script supports both OpenAI API and local OpenAI‑compatible endpoints (e.g., vLLM).

## Prerequisites

- Python venv prepared via Docker setup (see DOCKER_SETUP.md)
- .env at repo root (auto‑loaded) for API keys:
- `OPENAI_API_KEY` for OpenAI
- Optional tool keys (e.g., GAIA Google tools): `GOOGLE_API_KEY`, `GOOGLE_CX`
- For local inference (vLLM), start the server first (see DOCKER_SETUP.md or `serve_model.sh`).

## Unified Script

- Entry: `python scripts/rollout/unified_rollout.py`
- Core flags:
- `--env {alfworld,gaia,webshop}` choose environment
- `--model <name>` model name (OpenAI or local)
- `--base_url <url>` set when using local server (e.g., `http://127.0.0.1:8000/v1`)
- `--batch_size`, `--total_envs`, `--max_steps`, `--concurrency`
- `--dump_path <jsonl>` save trajectories
- `--chat_root <dir>` save chat histories under `trajectories/<ts>/<env>/<model>/`
- `--dry_run` plan batches without creating envs/calling models
- `--unique_envs` ensure unique task/game sampling where supported

## GAIA

Data path default: `data/gaia/val.json`

- Dry‑run (no model calls):
- `python scripts/rollout/openmanus_rollout.py --env gaia --batch_size 2 --total_envs 4 --dry_run`

- OpenAI small run (minimal tools):
- `python scripts/rollout/openmanus_rollout.py \
--env gaia --model gpt-4o \
--gaia_tools python_code_generator \
--batch_size 1 --total_envs 2 --max_steps 30 --concurrency 2 \
--dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`

- Local vLLM small run:
- `python scripts/rollout/openmanus_rollout.py \
--env gaia --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 \
--gaia_tools python_code_generator \
--batch_size 1 --total_envs 2 --max_steps 30 --concurrency 2 \
--dump_path logs/gaia/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`

## AlfWorld

Make sure AlfWorld is installed and game data downloaded (`alfworld-download -f`).

- Dry‑run (unique game files sampling):
- `python scripts/rollout/unified_rollout.py --env alfworld --unique_envs --batch_size 2 --total_envs 4 --dry_run`

- OpenAI small run:
- `python scripts/rollout/openmanus_rollout.py \
--env alfworld --model gpt-4o \
--batch_size 1 --total_envs 2 --max_steps 30 --concurrency 2 \
--dump_path logs/alfworld/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`

- Local vLLM small run:
- `python scripts/rollout/openmanus_rollout.py \
--env alfworld --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 \
--batch_size 1 --total_envs 2 --max_steps 20 --concurrency 2 \
--dump_path logs/alfworld/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`

## WebShop (optional)

To run WebShop, follow data/index setup in DOCKER_SETUP.md, then use:

- Dry‑run:
- `python scripts/rollout/openmanus_rollout.py --env webshop --batch_size 2 --total_envs 4 --dry_run`

- OpenAI:
- `python scripts/rollout/openmanus_rollout.py \
--env webshop --model gpt-4o \
--batch_size 2 --total_envs 4 --max_steps 30 --concurrency 2 \
--dump_path logs/webshop/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`

- Local vLLM:
- `python scripts/rollout/openmanus_rollout.py \
--env webshop --model qwen2.5-7b-alfworld --base_url http://127.0.0.1:8000/v1 \
--batch_size 2 --total_envs 4 --max_steps 30 --concurrency 2 \
--dump_path logs/webshop/trajectory_$(date +%Y%m%d_%H%M%S).jsonl --chat_root .`

## Outputs

- Logs: `logs/<env>/unified_run_*.log`
- Trajectory: `--dump_path` JSONL
- Chats: `trajectories/<timestamp>/<env>/<model>/` when `--chat_root` is set

6 changes: 6 additions & 0 deletions openmanus_rl/engines/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""LLM engine interfaces and factories.

This package provides lightweight wrappers around OpenAI-compatible
chat completion APIs and a simple factory used by tool modules.
"""

21 changes: 21 additions & 0 deletions openmanus_rl/engines/factory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""Engine factory helpers.

Exposes `create_llm_engine` returning a callable that maps prompt -> text using
the minimal `ChatOpenAI` wrapper. Keep the surface small and stable so tools
can depend on it without heavy coupling.
"""

from typing import Callable, Optional
from .openai import ChatOpenAI


def create_llm_engine(model_string: str = "gpt-4o-mini", is_multimodal: bool = False, base_url: Optional[str] = None) -> Callable[[str], str]:
chat = ChatOpenAI(model=model_string, base_url=base_url)

def _engine(prompt: str) -> str:
# Tools currently call engine(prompt) for text-only flows.
# If multimodal is needed later, extend by adding optional image args.
return chat(prompt)

return _engine

124 changes: 124 additions & 0 deletions openmanus_rl/engines/openai.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
"""Minimal OpenAI chat wrapper.

Provides a small surface compatible with internal code paths that expect
`ChatOpenAI` with a callable interface. Supports OpenAI-compatible backends
such as vLLM by honoring `OPENAI_BASE_URL`.
"""

from typing import Optional, List, Dict, Any, Type
import json
import re
try:
from pydantic import BaseModel # type: ignore
except Exception: # pragma: no cover
BaseModel = object # type: ignore
import os

try:
from openai import OpenAI # type: ignore
except Exception as exc: # pragma: no cover
OpenAI = None # type: ignore


class ChatOpenAI:
"""Thin wrapper around OpenAI's Chat Completions API.

The instance is callable and returns plain text. Images are not sent as
binary by design to remain compatible with OpenAI-compatible servers that
do not support multimodal content; image paths are appended as text hints.
"""

def __init__(
self,
model: str = "gpt-4o-mini",
base_url: Optional[str] = None,
api_key: Optional[str] = None,
temperature: float = 0.0,
) -> None:
if OpenAI is None:
raise RuntimeError("openai package is not installed")

self.model = model
self.temperature = temperature
self.base_url = base_url or os.getenv("OPENAI_BASE_URL")
self.api_key = api_key or os.getenv("OPENAI_API_KEY", "EMPTY")
self.client = OpenAI(api_key=self.api_key, base_url=self.base_url)

def __call__(
self,
prompt: str,
images: Optional[List[str]] = None,
system: Optional[str] = None,
response_format: Optional[Type] = None,
**_: Any,
) -> Any:
messages: List[Dict[str, Any]] = []
if system:
messages.append({"role": "system", "content": system})

if not images:
messages.append({"role": "user", "content": prompt})
else:
# Safe multimodal fallback: append image paths as text hints.
content = prompt
for p in images:
content += f"\n[Image: {p}]"
messages.append({"role": "user", "content": content})

resp = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=self.temperature,
n=1,
)
text = (resp.choices[0].message.content or "").strip()

# Best-effort structured parsing when a pydantic model is requested
try:
if response_format and isinstance(response_format, type) and issubclass(response_format, BaseModel):
# Try JSON first
try:
data = json.loads(text)
if isinstance(data, dict):
return response_format(**data)
if isinstance(data, list):
# Common pattern: patch list
payload: Dict[str, Any] = {}
if hasattr(response_format, "model_fields") and "patch" in response_format.model_fields: # pydantic v2
payload["patch"] = data
elif hasattr(response_format, "__fields__") and "patch" in getattr(response_format, "__fields__"):
payload["patch"] = data
if payload:
return response_format(**payload)
except Exception:
pass

# Special-case: AnswerVerification(analysis: str, true_false: bool)
if getattr(response_format, "__name__", "") == "AnswerVerification":
analysis = ""
tf = False
m = re.search(r"<analysis>\s*(.*?)\s*</analysis>", text, re.DOTALL)
if m:
analysis = m.group(1).strip()
m2 = re.search(r"<true_false>\s*(.*?)\s*</true_false>", text, re.DOTALL)
if m2:
val = m2.group(1).strip().lower()
tf = val in ("true", "1", "yes")
if not analysis:
analysis = text
return response_format(analysis=analysis, true_false=tf)

# Fallback: try to populate known common fields
payload: Dict[str, Any] = {}
for field in ("analysis", "text"):
if (hasattr(response_format, "model_fields") and field in response_format.model_fields) or (
hasattr(response_format, "__fields__") and field in getattr(response_format, "__fields__")
):
payload[field] = text
if payload:
return response_format(**payload)
except Exception:
# Swallow parsing errors and return raw text
pass

return text
Loading
Loading