Skip to content
20 changes: 20 additions & 0 deletions .vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,16 @@ export default defineConfig({
collapsed: false,
items: [
{ text: 'Concepts', link: 'tutorials/concepts.md' },
{
text: 'Atomic Agent',
link: 'tutorials/atomic_roles/intro.md',
items: [
{
text: 'RoleZero',
link: 'tutorials/atomic_roles/role_zero.md',
},
],
},
{ text: 'Agent 101', link: 'tutorials/agent_101.md' },
{
text: 'MultiAgent 101',
Expand Down Expand Up @@ -538,6 +548,16 @@ export default defineConfig({
text: '概念简述',
link: 'tutorials/concepts',
},
{
text: '原子化智能体',
link: 'tutorials/atomic_roles/intro.md',
items: [
{
text: 'RoleZero',
link: 'tutorials/atomic_roles/role_zero.md',
},
],
},
{
text: '智能体入门',
link: 'tutorials/agent_101',
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# LLM API Configuration
# LLM API Configuration

After completing the installation, follow these steps to configure the LLM API, using the OpenAI API as an example. This process is similar for other LLM APIs.

Expand Down Expand Up @@ -38,7 +39,7 @@ llm:

It can be used to initialize LLM. Due to some restrictions on the use of o1 series, problems can be reported to us in time.

With these steps, your setup is complete. For starting with MetaGPT, check out the [Quickstart guide](./quickstart) or our [Tutorials](/en/guide/tutorials/agent_101).
With these steps, your setup is complete. For starting with MetaGPT, check out the [Quickstart guide](./quickstart.md) or our [Tutorials](/en/guide/tutorials/agent_101.md).

MetaGPT supports a range of LLM models. Configure your model API keys as needed.

Expand Down
2 changes: 2 additions & 0 deletions src/en/guide/in_depth_guides/agent_communication.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ class Message(BaseModel):
cause_by: str = Field(default="", validate_default=True)
sent_from: str = Field(default="", validate_default=True)
send_to: set[str] = Field(default={MESSAGE_ROUTE_TO_ALL}, validate_default=True)
metadata: Dict[str, Any] = Field(default_factory=dict) # metadata for `content` and `instruct_content`

```

When planning the message forwarding process between agents, it's essential to first determine the functional boundaries of the agents, similar to designing a function:
Expand Down
8 changes: 8 additions & 0 deletions src/en/guide/in_depth_guides/environment/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ In `ExtEnv`, we refer to the design of `gymnasium` in the reinforcement learning

In addition, the decorators `mark_as_readable` and `mark_as_writeable` for the different `read-write` interfaces provided by `ExtEnv` are also provided to facilitate the unified management of method interfaces for external environment docking, so that subsequent agents can use them as A tool capability that can directly and automatically call different external environment docking interfaces based on the input natural language (this part of the function is to be opened).

### Observation space, action space definition specification

When defining observation and action space in `gymnasium`, discrete values ​​or continuous values ​​are generally defined. However, in the supported scene environments, game engine services or external simulators are more often accessed through APIs or interfaces. Therefore, for the action space (`gymnasium.spaces.Dict`), it contains subspace definitions of different action types and required input parameters under different actions. For the search space (`gymnasium.spaces.Dict`), it contains environmental information that can be obtained from the environment, such as maps, screenshots, etc.

`BaseEnvActionType` in `metagpt.base.base_env_space` defines the action type, `BaseEnvAction` defines a set of values ​​corresponding to the action space, and `BaseEnvObsType` defines the observation type.
Generally, the observation space values ​​obtained in `gymnasium` are a complete set of observation values, but in practical applications, it is often necessary to obtain local observation values ​​from the environment (for example, in Stanford Town, it is necessary to obtain map information within the field of view of the agent's location, rather than the complete map). We have added the `observe(self, obs_params: Optional[BaseEnvObsParams] = None)` method to obtain local environment information. `BaseEnvObsParams` defines the parameters required to obtain observation values, including the observation type and its required input parameters.

## Different Environments

Currently, we provide several scenario environments and provide corresponding scenario usage entrances under `MetaGPT/examples/`.
Expand All @@ -20,4 +27,5 @@ Currently, we provide several scenario environments and provide corresponding sc
- Added, [Werewolf Environment](./werewolf.md)
- Added, [Stanford Town Environment](./stanford_town.md)
- Added, [Android Environment](./android.md)
- Added, [MGXEnv Environment](./mgx.md)
- ToBeAdded, [Web Environment](./web.md)
96 changes: 96 additions & 0 deletions src/en/guide/in_depth_guides/environment/mgx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# MGX Environment

[Code Entry](https://github.com/geekan/MetaGPT/tree/main/metagpt/environment/mgx/mgx_env.py)

MGXEnv is a generic multi-agent collaboration environment that provides a flexible and powerful interaction framework. During initialization, the environment supports configuring multiple agents with different roles, each equipped with a specific prompt system to guide their behavior and responsibilities. The core feature of the environment is its unique message management mechanism: TeamLeader acts as a central coordinator, uniformly managing the flow and distribution of all messages. This design ensures both the orderliness of information transmission and supports flexible interaction methods, including public dialogue and private communication. Through this architectural design, MGXEnv can effectively support complex multi-agent collaboration scenarios, enabling different roles to efficiently complete division of labor and cooperation according to their respective professional fields and task requirements.

## Space Definition

### Message Space

MGXEnv mainly handles message routing and publishing in a multi-agent environment. The core message space is defined by the Message class with the following structure:

Definition:
```python
from gymnasium import spaces

space = {
"role": spaces.Text(16), # Message role type
"content": spaces.Text(1024), # Actual message content
"sent_from": spaces.Text(32), # Sender name
"send_to": spaces.Set(spaces.Text(32)), # Set of recipient names
"metadata": spaces.Dict(), # Additional metadata like images
}
```

Message Space Components:

| Field | Description | Value Range |
|-------|-------------|-------------|
| role | Message role type | One of ["user", "assistant", "system"] |
| content | Actual message content | Maximum length 1024 characters |
| sent_from | Message sender name | Maximum length 32 characters |
| send_to | Set of recipient names | Each name maximum 32 characters |
| metadata | Additional message metadata | Dictionary containing optional fields (like images) |

Message Example:
```python
from metagpt.schema import Message

Message(
role="assistant",
content="Analysis completed.",
sent_from="Alice",
send_to={"Mike", "<all>"},
metadata={"agent": "Emma"}
)
```

### Communication Modes

The environment supports two communication modes:

1. Public Chat Mode (default)
- All messages visible to all roles (send_to includes <all>)
- Message flow coordinated by team leader (Mike)
- Messages stored in environment history

2. Direct Chat Mode
- Triggered when user directly messages a specific role
- Communication only between user and target role
- Bypasses team leader
- Message publishing to all depends on is_public_chat flag

This environment focuses on message routing and coordination rather than traditional state/action spaces seen in other environments.

## Usage

```python
from metagpt.environment.mgx.mgx_env import MGXEnv
from metagpt.roles.di.team_leader import TeamLeader
from metagpt.schema import Message
from metagpt.roles import (
Architect,
Engineer,
ProductManager,
ProjectManager,
QaEngineer,
)

env = MGXEnv()

env.add_roles(
[
TeamLeader(),
ProductManager(),
Architect(),
ProjectManager(),
Engineer(n_borg=5, use_code_review=True),
QaEngineer(),
]
)
requirement = "create a 2048 game"
tl = env.get_role("Mike")
env.publish_message(Message(content=requirement, send_to=tl.name))
await tl.run()
```
2 changes: 1 addition & 1 deletion src/en/guide/in_depth_guides/environment/werewolf.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Definition:

```python
from gymnasium import spaces
from metagpt.environment.werewolf.const import STEP_INSTRUCTIONS
from metagpt.environment.werewolf.werewolf_ext_env import STEP_INSTRUCTIONS

space = spaces.Dict(
{
Expand Down
28 changes: 21 additions & 7 deletions src/en/guide/tutorials/agent_101.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,34 @@ Import any role, initialize it, run it with a starting message, done!
```python
import asyncio

from metagpt.context import Context
from metagpt.roles.product_manager import ProductManager
from metagpt.logs import logger
from metagpt.schema import Message

async def main():
msg = "Write a PRD for a snake game"
context = Context() # The session Context object is explicitly created, and the Role object implicitly shares it automatically with its own Action object
role = ProductManager(context=context)
while msg:
msg = await role.run(msg)
logger.info(str(msg))
# 1. Create ProductManager instance
pm = ProductManager(
name="Alice", # Use default name or customize
use_fixed_sop=True, # Enable fixed Standard Operating Procedure mode
)

# 2. Prepare user requirement
requirement = "Write a PRD for a snake game"

# 3. Create requirement message
requirement_msg = Message(
content=requirement,
role="user"
)

# 4. Run ProductManager to get PRD
result = await pm.run(with_message=requirement_msg)

logger.info(result)

if __name__ == '__main__':
asyncio.run(main())

```

## Develop your first agent
Expand Down
40 changes: 40 additions & 0 deletions src/en/guide/tutorials/atomic_roles/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# RoleZero Architecture Design Specification

## **Background: Evolution from SOPs to a General Agent Framework**

In traditional agent frameworks, **Standard Operating Procedures (SOPs)** serve as the core solution for addressing specific scenarios. For example, in a software development environment, SOPs strictly define the code directory structure, data interaction formats, and task execution sequences. However, these SOPs have significant drawbacks:

1. **Strong scenario dependency**: SOPs are highly coupled with specific business scenarios, making them difficult to adapt to other domains (e.g., healthcare, finance).
1. **Poor scalability**: Adding new business requirements necessitates custom development, leading to high development costs and low iteration efficiency.
1. **Weak fault tolerance**: If the process is interrupted, it cannot resume from the breakpoint and must restart from the beginning.

For example, in a software company, SOPs require agents to interact with data in a fixed directory structure. However, third-party projects may use different structures, rendering the agent incompatible. Therefore, a **modular and generalized** framework is needed to decouple processes from scenarios, enhancing the agent's adaptability.

## **Objective: Building Core Capabilities for a General Agent**

The goal of RoleZero is to **overcome the limitations of SOPs through atomic functional elements and dynamic process orchestration**, achieving the following capabilities:

1. **Flexible process orchestration**: Solve business problems dynamically using `think->action loops` or **chained atomic units** without custom development.
1. **Breakpoint recovery**: Resume tasks from the last successful node in case of an exception.
1. **Seamless business integration**: Support cross-domain collaboration (e.g., software company SOPs directly modifying third-party code) without additional development.

## **Core Capabilities of RoleZero**

As a general template for agents, RoleZero covers the entire lifecycle of intelligent agents:

1. **Data Understanding (ENV/IO)** : Dynamically parse the structure and semantics of environmental inputs (e.g., code, documents).

2. **Observation (Observe)** : Filter and format key data from the environment (ENV) for decision-making.

3. **Thinking (Think)** : Dynamically generate or adjust task plans, supporting four types of decision logic:

- **Task decomposition**: Break down ambiguous goals into atomic subtasks (e.g., "Develop login feature" → Design API → Write code → Test).
- **Task retry**: Adjust task constraints based on error feedback (e.g., add code format checks).
- **Process progression**: Mark the current task as complete and trigger the next task.
- **Human assistance**: Seek user clarification when unable to make decisions (e.g., asking for additional data or seeking user suggestions in case of errors or uncertainty).

4. **Execution (Act)** : Call tools to execute atomic tasks, supporting experience reuse and context injection.

5. **Memory (Memory)** : Store task states and historical data.

6. **Evaluation (Evaluate)** : Dynamically verify task results.
Loading
Loading