Hugging Face Task Example

Run tasks from the mercor/apex-agents benchmark dataset, which contains 480 professional services tasks across investment banking, tax accounting, management consulting, and more.

Task

The default task is an Investment Banking challenge from World 221. The prompt is:

Calculate the accretion / dilution of both BBDC and TVPG shareholders, sensitized for different Cash consideration and Bid Premium.

Edit the existing merger model and add two sensitivity analyses: one showing BBDC accretion/dilution and one showing TVPG accretion/dilution, each sensitized to bid premium (10% and 20%) and cash consideration (10% and 15%).

Assume an increase of EBIT Synergies by 480bps and a 210bps decrease in post-deal bidder share price downside. All output values should be in %, rounded to 2 decimal places.

Quick Start

cd archipelago/examples/hugging_face_task

# Set your LLM provider's API key
export GOOGLE_API_KEY=...      # or
export ANTHROPIC_API_KEY=...   # or
export OPENAI_API_KEY=...

./run.sh

The script will:

Download task data from HuggingFace
Start the environment container
Populate the environment with the world snapshot
Configure all MCP servers
Run the agent
Save the final snapshot
Run grading and display results

Running Different Tasks

# Run default task (Investment Banking - BBDC/TVPG accretion/dilution)
./run.sh

# Run task at a specific index (0-479)
./run.sh 42

# Run task by ID
./run.sh task_9ba58a6197114140877a1df1754d2993

Output

Results are saved to output/<task_id>/:

File	Description
`trajectory.json`	Agent's conversation history and tool calls
`final_snapshot.zip`	Final state of the environment
`grades.json`	Grading results with scores and rationale
`initial_messages.json`	Task prompt (from HuggingFace)
`agent_config.json`	Agent configuration used
`verifiers.json`	Grading criteria (from HuggingFace rubric)

How It Works

Unlike simple_task which uses static pre-defined files, this example dynamically fetches everything from HuggingFace:

┌─────────────────────┐
│   HuggingFace       │
│   mercor/apex-agents│
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐    ┌─────────────────────┐    ┌─────────────────────┐
│  1. Download Task   │───▶│  2. Run Agent       │───▶│  3. Grade Results   │
│  - Task prompt      │    │  - All MCP servers  │    │  - Compare snapshots│
│  - World snapshot   │    │  - Execute task     │    │  - Evaluate rubric  │
│  - Rubric criteria  │    │  - Save trajectory  │    │  - Calculate score  │
└─────────────────────┘    └─────────────────────┘    └─────────────────────┘

Configuration

Changing the Model

Edit orchestrator_config.json:

{
  "model": "anthropic/claude-sonnet-4-20250514"
}

Using Fewer MCP Servers

The default mcp_config_all_oss_servers.json starts all 9 servers. For faster startup, you can create a custom config with only the servers your task needs. Check the world description in the HuggingFace dataset to see which apps are required.

Available MCP Servers

Server	Description
`calendar_server`	Calendar and scheduling
`chat_server`	Chat/messaging
`code_execution_server`	Python code execution
`spreadsheets_server`	Spreadsheets/spreadsheet manipulation
`filesystem_server`	File operations
`mail_server`	Email
`pdfs_server`	PDF reading and manipulation
`presentations_server`	Presentations/slides
`documents_server`	Documents/document editing

Troubleshooting

Task not found

The dataset contains 480 tasks indexed 0-479. Use --task-index for numeric indices or --task-id for specific task IDs.

Environment fails to start

Check Docker is running and ports aren't in use:

docker ps
lsof -i :8080

Agent timeout

For complex tasks, the agent may need more steps. Modify max_steps in main.py:

agent_config = {
    "agent_config_values": {"timeout": 3600, "max_steps": 100},  # Increase from 50
    ...
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hugging Face Task Example

Task

Quick Start

Running Different Tasks

Output

How It Works

Configuration

Changing the Model

Using Fewer MCP Servers

Available MCP Servers

Troubleshooting

Task not found

Environment fails to start

Agent timeout

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Hugging Face Task Example

Task

Quick Start

Running Different Tasks

Output

How It Works

Configuration

Changing the Model

Using Fewer MCP Servers

Available MCP Servers

Troubleshooting

Task not found

Environment fails to start

Agent timeout