Skip to content

A virtual trajectory generation framework for mobile GUI agents. Uses Nano Banana 2 to create Android UI environments where agents interact and generate behavioral trajectories. Implements Mobile-Agent-V3 architecture for intelligent task execution. Eliminates physical devices - purely AI-driven GUI simulation for trajectory collection.

License

Notifications You must be signed in to change notification settings

Futuresis/GUI-Trajectory-Virtual

Repository files navigation

Virtual GUI Agent Trajectory Synthesis

English | 简体中文

Workflow Overview

A streamlined implementation of Mobile-Agent-v3 that runs in a virtual GUI environment using Gemini AI to simulate Android interfaces, eliminating the need for physical devices or emulators.

🌟 Features

  • Virtual Android Environment: Uses Gemini-3 to generate realistic Android UI screenshots
  • No Device Required: Runs completely virtually without Android emulators or physical devices
  • GUI-Owl Integration: Leverages GUI-Owl-7B model for intelligent agent actions
  • Flexible Task Support: Supports both standard Android tasks and custom social app tasks
  • Automatic Trajectory Extraction: Captures and exports interaction trajectories for analysis

📋 Requirements

  • Python 3.11+
  • Gemini API access (via OpenRouter or compatible API)
  • GUI-Owl-7B model or compatible LLM for agent reasoning
  • GPU server with vLLM for GUI-Owl model (optional, can use remote API)

Note: This project uses the Mobile-Agent-V3 architecture, which employs a multi-agent system (Planning, Decision, and Reflection agents) for intelligent task execution.

Quick Start

1. Installation

Option A: Using Anaconda (Recommended for Windows)

# Create Anaconda environment
conda create -n GUI-V python=3.11 -y
conda activate GUI-V

# Install dependencies
pip install -r requirements.txt

Option B: Using pip directly

# Clone the repository
git clone https://github.com/Futuresis/GUI-Trajectory-Virtual.git
cd GUI-Trajectory-Virtual

# Install dependencies
pip install -r requirements.txt

2. Configuration

Create a .env file or set environment variables:

# Gemini API Configuration (for virtual environment)
export GEMINI_API_KEY="your-gemini-api-key"
export GEMINI_BASE_URL="your-gemini-api-url"
export GEMINI_MODEL="google/gemini-3-pro-image-preview"

# Agent LLM Configuration (GUI-Owl or compatible)
export AGENT_API_KEY="your-api-key"
export AGENT_BASE_URL="your-api-url"
export AGENT_MODEL="path/to/model"

3. Running Tasks

On Linux/Mac (using bash script)

# Run default contact task
bash run_virtual.sh

# Run social app tasks
bash run_virtual.sh --tasks social --category message

On Windows (using Python directly)

# Activate Anaconda environment first
conda activate GUI-V

# Run default contact task (requires .env file configured)
python run_virtual.py \
  --suite_family=agent_env \
  --agent_name=mobile_agent_v3 \
  --model=%AGENT_MODEL% \
  --api_key=%AGENT_API_KEY% \
  --base_url=%AGENT_BASE_URL% \
  --gemini_api_key=%GEMINI_API_KEY% \
  --gemini_base_url=%GEMINI_BASE_URL% \
  --gemini_model=%GEMINI_MODEL% \
  --tasks=ContactsAddContact \
  --use_virtual_env=True

Advanced Usage

# Run with custom parameters (all platforms)
python run_virtual.py \
  --suite_family=agent_env \
  --agent_name=mobile_agent_v3 \
  --model=$AGENT_MODEL \
  --api_key=$AGENT_API_KEY \
  --base_url=$AGENT_BASE_URL \
  --gemini_api_key=$GEMINI_API_KEY \
  --gemini_base_url=$GEMINI_BASE_URL \
  --gemini_model=$GEMINI_MODEL \
  --tasks=ContactsAddContact \
  --use_virtual_env=True

Important Notes for Windows Users

Encoding Compatibility

This project has been updated to be fully compatible with Windows GBK encoding:

  • All Unicode special characters in output have been replaced with ASCII equivalents
  • Status messages use [OK], [WARNING], [FAILED] instead of Unicode symbols
  • Chinese comments in code are preserved (they don't affect runtime)

Running on Windows

  1. Use Anaconda environment (recommended)
  2. Run Python scripts directly instead of bash scripts
  3. Configure .env file with your API keys before running

Project Structure

GUI-Trajectory-Virtual/
├── run_virtual.py           # Main execution script
├── run_virtual.sh           # Bash wrapper with presets
├── virtual_env_adapter.py   # Virtual environment adapter
├── virtual_env_gemini3.py   # Gemini-based virtual environment
├── start_Android.png        # Initial Android screenshot
├── agent_env/              # Agent framework
│   ├── agents/             # Agent implementations
│   ├── task_evals/         # Task evaluations
│   └── utils/              # Utility functions
├── requirements.txt         # Python dependencies
├── Dockerfile              # Container setup
├── logs/                   # Execution logs
├── trajectories/           # Generated trajectories
└── outputs/                # Other results

🎯 Task Categories

Social App Tasks

The system supports 10 categories of social app interactions:

  • Message: Forwarding, quoting, scheduled messages
  • Contact: Tag editing, duplicate merging, smart recommendations
  • Group: Member management, reorganization, customization
  • Media: Photo/video sharing, voice messages, location-based meetup
  • Call: Scheduled calls, video with screen share, group calls
  • Privacy: Notification rules, mute management, privacy controls
  • Emoji: Contextual reactions, batch reactions, custom emoji
  • File: Document editing, collection, collaborative review
  • Status: Scheduled posts, engagement chains, interactive polls
  • Search: Context search, cross-chat retrieval, history export

🔧 Configuration Options

Environment Variables

Variable Description Required
GEMINI_API_KEY Gemini API key Yes
GEMINI_BASE_URL Gemini API endpoint URL Yes
GEMINI_MODEL Gemini model name Yes
AGENT_MODEL Agent LLM model path or name Yes
AGENT_BASE_URL Agent API endpoint URL Yes
AGENT_API_KEY Agent API key Yes

Command Line Arguments

python run_virtual.py --help

Key arguments:
  --suite_family         Task suite family (default: agent_env)
  --agent_name           Agent type (default: mobile_agent_v3)
  --model                Agent LLM model name
  --api_key              Agent API key
  --base_url             Agent API base URL
  --gemini_api_key       Gemini API key
  --gemini_base_url      Gemini API base URL (default: https://openrouter.ai/api/v1)
  --gemini_model         Gemini model name (default: google/gemini-2.0-flash-exp:free)
  --tasks                Task name list (e.g., ContactsAddContact)
  --n_task_combinations  Number of task parameter combinations (default: 1)
  --use_virtual_env      Use virtual environment (default: True)
  --initial_image_path   Initial Android screenshot (default: start_Android.png)
  --resolution           Virtual screen resolution (default: 1080x2400)
  --traj_output_path     Trajectory output directory (default: traj_output)
  --output_path          Checkpoint output directory (default: results)
  --task_random_seed     Random seed for task parameter sampling
  --fixed_task_seed      Whether to use same task seed across combinations

🐳 Docker Support

Build Image

docker build -t gui-trajectory-virtual .

Run Container

docker run -it \
  -e GEMINI_API_KEY="your-gemini-key" \
  -e GEMINI_BASE_URL="your-gemini-url" \
  -e GEMINI_MODEL="google/gemini-3-pro-image-preview" \
  -e AGENT_MODEL="GUI-Owl-7B" \
  -e AGENT_BASE_URL="http://host.docker.internal:4243/v1" \
  -e AGENT_API_KEY="EMPTY" \
  -v $(pwd)/logs:/app/logs \
  -v $(pwd)/trajectories:/app/trajectories \
  gui-trajectory-virtual \
  bash run_virtual.sh

📊 Output

Trajectories

Trajectories are automatically saved in the trajectories/ directory with timestamped folders:

trajectories/
└── traj_virtual_2024-12-04_20-15-30/
    └── traj.jsonl           # Trajectory in JSONL format (one JSON per line)

Each line in traj.jsonl contains a complete step with Manager-Operator-Reflector format:

{
  "manager": {
    "response": "...",
    "thought": "Need to open contacts app",
    "plan": "1. Tap contacts icon\n2. Add new contact",
    "user_request": "Add a new contact named John Doe"
  },
  "operator": {
    "response": "...",
    "thought": "Tapping on contacts app icon",
    "action": "{\"action_type\": \"tap\", \"coordinates\": [540, 1200]}",
    "description": "Tap on Contacts app",
    "user_request": "Add a new contact named John Doe"
  },
  "reflector": {
    "response": "...",
    "outcome": "Successfully opened contacts app",
    "error_description": "",
    "user_request": "Add a new contact named John Doe"
  }
}

Logs

Detailed execution logs are saved with timestamps in the logs/ directory:

logs/log_virtual_2024-12-04_20-15-30.log

🔬 Advanced Usage

Running the Default Task

The project currently includes one task: ContactsAddContact. Run it using:

# Using bash script (Linux/Mac)
bash run_virtual.sh

# Using Python directly
python run_virtual.py \
  --suite_family=agent_env \
  --agent_name=mobile_agent_v3 \
  --model=$AGENT_MODEL \
  --api_key=$AGENT_API_KEY \
  --base_url=$AGENT_BASE_URL \
  --gemini_api_key=$GEMINI_API_KEY \
  --gemini_base_url=$GEMINI_BASE_URL \
  --gemini_model=$GEMINI_MODEL \
  --tasks=ContactsAddContact \
  --n_task_combinations=1 \
  --use_virtual_env=True

Adding New Tasks

To add custom tasks to the project:

  1. Create a task class in agent_env/task_evals/single/ (e.g., your_app.py):
from agent_env.task_evals.single import contacts

class YourCustomTask(contacts.ContactsAddContact):
    """Your custom task description."""
    
    complexity = 1.5  # Task complexity coefficient
    
    @property
    def goal(self) -> str:
        return "Your task goal description here"
  1. Register the task in agent_env/registry.py:
# Import your task module
from agent_env.task_evals.single import your_app

# Add to _TASKS tuple
_TASKS = (
    contacts.ContactsAddContact,
    your_app.YourCustomTask,  # Add your task here
)
  1. Run your custom task:
python run_virtual.py \
  --suite_family=agent_env \
  --agent_name=mobile_agent_v3 \
  --tasks=YourCustomTask \
  --use_virtual_env=True \
  # ... other parameters

Output Files

After execution, check:

# Logs with timestamps
ls logs/log_virtual_*.log

# Trajectories with timestamps
ls trajectories/traj_virtual_*/

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

📝 License

This project is licensed under the MIT License - see LICENSE file for details.

Note: The arXiv paper will be released soon. Please check back for the updated citation information.

🙏 Acknowledgments

📧 Contact

For questions or issues, please open an issue on GitHub.

🔗 Related Projects

About

A virtual trajectory generation framework for mobile GUI agents. Uses Nano Banana 2 to create Android UI environments where agents interact and generate behavioral trajectories. Implements Mobile-Agent-V3 architecture for intelligent task execution. Eliminates physical devices - purely AI-driven GUI simulation for trajectory collection.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages