Skip to content

Commit f829784

Browse files
authored
Merge pull request #39 from LINs-lab/doc-support
docs: update usage documentation
2 parents a3700ef + 77d8de8 commit f829784

1 file changed

Lines changed: 62 additions & 57 deletions

File tree

docs/quick_start/usage.md

Lines changed: 62 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,90 +1,95 @@
11
# Usage
22

3-
You can run benchmarks using `main.py` or the provided shell script.
3+
This guide explains how to run benchmarks and use the automated workflow optimizer with MASArena.
44

5-
## Configuration
5+
## Prerequisites
66

7-
First, create a `.env` file in the project root and set the following:
7+
1. **Install dependencies:**
8+
If you haven't already, install the required packages. We recommend using `uv`.
9+
```bash
10+
uv sync
11+
```
812

9-
```bash
10-
OPENAI_API_KEY=your_openai_api_key
11-
MODEL_NAME=gpt-4o-mini
12-
OPENAI_API_BASE=https://api.openai.com/v1
13-
```
13+
2. **Configure Environment Variables:**
14+
Create a `.env` file in the project root and set your OpenAI API key and desired model.
15+
```bash
16+
OPENAI_API_KEY=your_openai_api_key
17+
MODEL_NAME=gpt-4o-mini
18+
OPENAI_API_BASE=https://api.openai.com/v1
19+
```
1420

15-
## Using `main.py`
21+
## Running Benchmarks
1622

17-
### Basic Usage
23+
You can run benchmarks using the convenience shell script `run_benchmark.sh` (recommended) or by directly calling `main.py`.
1824

19-
```bash
20-
# Run a math benchmark with a single agent
21-
python main.py --benchmark math --agent-system single_agent --limit 5
25+
### Using the Shell Script (`run_benchmark.sh`)
2226

23-
# Run with supervisor-based multi-agent system
24-
python main.py --benchmark math --agent-system supervisor_mas --limit 10
25-
26-
# Run with swarm-based multi-agent system
27-
python main.py --benchmark math --agent-system swarm --limit 5
28-
```
29-
30-
### Using the Shell Runner
31-
32-
A convenience script `run_benchmark.sh` is provided for quick runs.
27+
The `run_benchmark.sh` script is the simplest way to run evaluations.
3328

29+
**Syntax:**
3430
```bash
35-
# Syntax: ./run_benchmark.sh <benchmark_name> <agent_system> <limit>
31+
# Usage: ./run_benchmark.sh [benchmark] [agent_system] [limit] [mcp_config] [concurrency] [optimizer]
3632
./run_benchmark.sh math supervisor_mas 10
3733
```
38-
### Advanced Usage: Asynchronous Execution
3934

40-
For benchmarks that support concurrency, you can run them asynchronously to speed up evaluation.
35+
**Examples:**
4136

4237
```bash
43-
# Run the humaneval benchmark with a concurrency of 10
44-
python main.py --benchmark humaneval --async-run --concurrency 10
38+
# Run the 'math' benchmark on 10 problems with the 'supervisor_mas' agent system
39+
./run_benchmark.sh math supervisor_mas 10
40+
41+
# Run the 'humaneval' benchmark asynchronously with a concurrency of 10
42+
# The "" is a placeholder for the mcp_config argument.
43+
./run_benchmark.sh humaneval single_agent 20 "" 10
4544
```
46-
*Note: Benchmarks that do not support concurrency (e.g., `math`, `aime`) will automatically run in synchronous mode, even if `--async-run` is specified.*
4745

48-
### Advanced Usage: Optimizer Execution
46+
## Automated Workflow Optimization (AFlow)
4947

50-
You can run an optimization process before the benchmark. For example, to use the `aflow` optimizer:
48+
MASArena includes AFlow implementation, an automated optimizer for agent workflows.
49+
50+
**Example:**
51+
To run AFlow to optimize an agent for the `humaneval` benchmark, provide `aflow` as the optimizer argument to the shell script:
5152

5253
```bash
53-
python main.py --run-optimizer aflow --benchmark humaneval
54+
# The "" arguments are placeholders for mcp_config and concurrency.
55+
./run_benchmark.sh humaneval single_agent 10 "" "" aflow
5456
```
5557

56-
## Command-Line Arguments
5758

58-
Here are some of the most common arguments for `main.py`:
59-
60-
| Argument | Description | Default |
61-
|---------------------| ------------------------------------------------------------------------ |-------------------------------|
62-
| `--benchmark` | The name of the benchmark to run. | `math` |
63-
| `--agent-system` | The agent system to use for the benchmark. | `single_agent` |
64-
| `--verbose` | Print progress information | `True` |
65-
| `--limit` | The maximum number of problems to evaluate. | `None` |
66-
| `--data` | Path to a custom benchmark data file (JSONL format). | `data/{benchmark}_test.jsonl` |
67-
| `--data-id` | A specific data ID to run from the benchmark file. | `None` |
68-
| `--async-run` | Run the benchmark asynchronously for faster evaluation. | `False` |
69-
| `--concurrency` | Set the concurrency level for asynchronous runs. | `10` |
70-
| `--results-dir` | Directory to store detailed JSON results. | `results/` |
71-
| `--use-tools` | Enable the agent to use integrated tools (e.g., code interpreter). | `False` |
72-
| `--use-mcp-tools` | Enable the agent to use tools via the Multi-Agent Communication Protocol. | `False` |
73-
| `--mcp-config-file` | Path to the MCP server configuration file. Required if using MCP tools. | `None` |
7459

75-
### Optimizer Arguments
60+
## Command-Line Arguments
7661

77-
When using `--run-optimizer`, the following arguments are available:
62+
Here are the most common arguments for `main.py`.
63+
64+
### Main Arguments
7865

7966
| Argument | Description | Default |
8067
|---|---|---|
81-
| `--run-optimizer` | The optimization process to run. | `None` |
82-
| `--graph_path` | Path to the agent flow graph configuration. | `mas_arena/configs/aflow` |
83-
| `--optimized_path` | Path to save the optimized agent flow graph. | `example/aflow/humaneval/optimization` |
84-
| `--validation_rounds` | Number of validation rounds. | `1` |
85-
| `--eval_rounds` | Number of evaluation rounds. | `1` |
86-
| `--max_rounds` | Maximum number of optimization rounds. | `3` |
68+
| `--benchmark` | The name of the benchmark to run. | `math` |
69+
| `--agent-system` | The agent system to use for the benchmark. | `single_agent` |
70+
| `--limit` | The maximum number of problems to evaluate. | `None` (all) |
71+
| `--data` | Path to a custom benchmark data file (JSONL format). | `data/{benchmark}_test.jsonl` |
72+
| `--results-dir` | Directory to store detailed JSON results. | `results/` |
73+
| `--verbose` | Print progress information. | `True` |
74+
| `--async-run` | Run the benchmark asynchronously for faster evaluation. | `False` |
75+
| `--concurrency` | Set the concurrency level for asynchronous runs. | `10` |
76+
| `--use-tools` | Enable the agent to use integrated tools (e.g., code interpreter). | `False` |
77+
| `--use-mcp-tools` | Enable the agent to use tools via MCP. | `False` |
78+
| `--mcp-config-file`| Path to the MCP server configuration file. Required for MCP tools. | `None` |
79+
| `--data-id` | Data ID to use. | `None` |
80+
81+
### Optimizer Arguments
82+
83+
These arguments are used when running an optimizer like AFlow via `--run-optimizer`.
8784

85+
| Argument | Type | Default | Description |
86+
|---|---|---|---|
87+
| `--run-optimizer` | str | `None` | Specifies the optimizer to run. Use `aflow`. |
88+
| `--graph_path` | str | `mas_arena/configs/aflow` | Path to the base AFlow graph configuration. |
89+
| `--optimized_path` | str | `example/aflow/humaneval/optimization` | Path to save the optimized AFlow graph. |
90+
| `--validation_rounds`| int | 1 | Number of validation rounds per optimization cycle. |
91+
| `--eval_rounds` | int | 1 | Number of evaluation rounds per optimization cycle. |
92+
| `--max_rounds` | int | 3 | Maximum number of optimization rounds. |
8893

8994
## Example Output
9095

0 commit comments

Comments
 (0)