Virtual GUI Agent Trajectory Synthesis

A streamlined implementation of Mobile-Agent-v3 that runs in a virtual GUI environment using Gemini AI to simulate Android interfaces, eliminating the need for physical devices or emulators.

🌟 Features

Virtual Android Environment: Uses Gemini-3 to generate realistic Android UI screenshots
No Device Required: Runs completely virtually without Android emulators or physical devices
GUI-Owl Integration: Leverages GUI-Owl-7B model for intelligent agent actions
Flexible Task Support: Supports both standard Android tasks and custom social app tasks
Automatic Trajectory Extraction: Captures and exports interaction trajectories for analysis

📋 Requirements

Python 3.11+
Gemini API access (via OpenRouter or compatible API)
GUI-Owl-7B model or compatible LLM for agent reasoning
GPU server with vLLM for GUI-Owl model (optional, can use remote API)

Note: This project uses the Mobile-Agent-V3 architecture, which employs a multi-agent system (Planning, Decision, and Reflection agents) for intelligent task execution.

Quick Start

1. Installation

Option A: Using Anaconda (Recommended for Windows)

# Create Anaconda environment
conda create -n GUI-V python=3.11 -y
conda activate GUI-V

# Install dependencies
pip install -r requirements.txt

Option B: Using pip directly

# Clone the repository
git clone https://github.com/Futuresis/GUI-Trajectory-Virtual.git
cd GUI-Trajectory-Virtual

# Install dependencies
pip install -r requirements.txt

2. Configuration

Create a .env file or set environment variables:

# Gemini API Configuration (for virtual environment)
export GEMINI_API_KEY="your-gemini-api-key"
export GEMINI_BASE_URL="your-gemini-api-url"
export GEMINI_MODEL="google/gemini-3-pro-image-preview"

# Agent LLM Configuration (GUI-Owl or compatible)
export AGENT_API_KEY="your-api-key"
export AGENT_BASE_URL="your-api-url"
export AGENT_MODEL="path/to/model"

3. Running Tasks

On Linux/Mac (using bash script)

# Run default contact task
bash run_virtual.sh

# Run social app tasks
bash run_virtual.sh --tasks social --category message

On Windows (using Python directly)

# Activate Anaconda environment first
conda activate GUI-V

# Run default contact task (requires .env file configured)
python run_virtual.py \
  --suite_family=agent_env \
  --agent_name=mobile_agent_v3 \
  --model=%AGENT_MODEL% \
  --api_key=%AGENT_API_KEY% \
  --base_url=%AGENT_BASE_URL% \
  --gemini_api_key=%GEMINI_API_KEY% \
  --gemini_base_url=%GEMINI_BASE_URL% \
  --gemini_model=%GEMINI_MODEL% \
  --tasks=ContactsAddContact \
  --use_virtual_env=True

Advanced Usage

# Run with custom parameters (all platforms)
python run_virtual.py \
  --suite_family=agent_env \
  --agent_name=mobile_agent_v3 \
  --model=$AGENT_MODEL \
  --api_key=$AGENT_API_KEY \
  --base_url=$AGENT_BASE_URL \
  --gemini_api_key=$GEMINI_API_KEY \
  --gemini_base_url=$GEMINI_BASE_URL \
  --gemini_model=$GEMINI_MODEL \
  --tasks=ContactsAddContact \
  --use_virtual_env=True

Important Notes for Windows Users

Encoding Compatibility

This project has been updated to be fully compatible with Windows GBK encoding:

All Unicode special characters in output have been replaced with ASCII equivalents
Status messages use [OK], [WARNING], [FAILED] instead of Unicode symbols
Chinese comments in code are preserved (they don't affect runtime)

Running on Windows

Use Anaconda environment (recommended)
Run Python scripts directly instead of bash scripts
Configure .env file with your API keys before running

Project Structure

GUI-Trajectory-Virtual/
├── run_virtual.py           # Main execution script
├── run_virtual.sh           # Bash wrapper with presets
├── virtual_env_adapter.py   # Virtual environment adapter
├── virtual_env_gemini3.py   # Gemini-based virtual environment
├── start_Android.png        # Initial Android screenshot
├── agent_env/              # Agent framework
│   ├── agents/             # Agent implementations
│   ├── task_evals/         # Task evaluations
│   └── utils/              # Utility functions
├── requirements.txt         # Python dependencies
├── Dockerfile              # Container setup
├── logs/                   # Execution logs
├── trajectories/           # Generated trajectories
└── outputs/                # Other results

🎯 Task Categories

Social App Tasks

The system supports 10 categories of social app interactions:

Message: Forwarding, quoting, scheduled messages
Contact: Tag editing, duplicate merging, smart recommendations
Group: Member management, reorganization, customization
Media: Photo/video sharing, voice messages, location-based meetup
Call: Scheduled calls, video with screen share, group calls
Privacy: Notification rules, mute management, privacy controls
Emoji: Contextual reactions, batch reactions, custom emoji
File: Document editing, collection, collaborative review
Status: Scheduled posts, engagement chains, interactive polls
Search: Context search, cross-chat retrieval, history export

🔧 Configuration Options

Environment Variables

Variable	Description	Required
`GEMINI_API_KEY`	Gemini API key	Yes
`GEMINI_BASE_URL`	Gemini API endpoint URL	Yes
`GEMINI_MODEL`	Gemini model name	Yes
`AGENT_MODEL`	Agent LLM model path or name	Yes
`AGENT_BASE_URL`	Agent API endpoint URL	Yes
`AGENT_API_KEY`	Agent API key	Yes

Command Line Arguments

python run_virtual.py --help

Key arguments:
  --suite_family         Task suite family (default: agent_env)
  --agent_name           Agent type (default: mobile_agent_v3)
  --model                Agent LLM model name
  --api_key              Agent API key
  --base_url             Agent API base URL
  --gemini_api_key       Gemini API key
  --gemini_base_url      Gemini API base URL (default: https://openrouter.ai/api/v1)
  --gemini_model         Gemini model name (default: google/gemini-2.0-flash-exp:free)
  --tasks                Task name list (e.g., ContactsAddContact)
  --n_task_combinations  Number of task parameter combinations (default: 1)
  --use_virtual_env      Use virtual environment (default: True)
  --initial_image_path   Initial Android screenshot (default: start_Android.png)
  --resolution           Virtual screen resolution (default: 1080x2400)
  --traj_output_path     Trajectory output directory (default: traj_output)
  --output_path          Checkpoint output directory (default: results)
  --task_random_seed     Random seed for task parameter sampling
  --fixed_task_seed      Whether to use same task seed across combinations

🐳 Docker Support

Build Image

docker build -t gui-trajectory-virtual .

Run Container

docker run -it \
  -e GEMINI_API_KEY="your-gemini-key" \
  -e GEMINI_BASE_URL="your-gemini-url" \
  -e GEMINI_MODEL="google/gemini-3-pro-image-preview" \
  -e AGENT_MODEL="GUI-Owl-7B" \
  -e AGENT_BASE_URL="http://host.docker.internal:4243/v1" \
  -e AGENT_API_KEY="EMPTY" \
  -v $(pwd)/logs:/app/logs \
  -v $(pwd)/trajectories:/app/trajectories \
  gui-trajectory-virtual \
  bash run_virtual.sh

📊 Output

Trajectories

Trajectories are automatically saved in the trajectories/ directory with timestamped folders:

trajectories/
└── traj_virtual_2024-12-04_20-15-30/
    └── traj.jsonl           # Trajectory in JSONL format (one JSON per line)

Each line in traj.jsonl contains a complete step with Manager-Operator-Reflector format:

{
  "manager": {
    "response": "...",
    "thought": "Need to open contacts app",
    "plan": "1. Tap contacts icon\n2. Add new contact",
    "user_request": "Add a new contact named John Doe"
  },
  "operator": {
    "response": "...",
    "thought": "Tapping on contacts app icon",
    "action": "{\"action_type\": \"tap\", \"coordinates\": [540, 1200]}",
    "description": "Tap on Contacts app",
    "user_request": "Add a new contact named John Doe"
  },
  "reflector": {
    "response": "...",
    "outcome": "Successfully opened contacts app",
    "error_description": "",
    "user_request": "Add a new contact named John Doe"
  }
}

Logs

Detailed execution logs are saved with timestamps in the logs/ directory:

logs/log_virtual_2024-12-04_20-15-30.log

🔬 Advanced Usage

Running the Default Task

The project currently includes one task: ContactsAddContact. Run it using:

# Using bash script (Linux/Mac)
bash run_virtual.sh

# Using Python directly
python run_virtual.py \
  --suite_family=agent_env \
  --agent_name=mobile_agent_v3 \
  --model=$AGENT_MODEL \
  --api_key=$AGENT_API_KEY \
  --base_url=$AGENT_BASE_URL \
  --gemini_api_key=$GEMINI_API_KEY \
  --gemini_base_url=$GEMINI_BASE_URL \
  --gemini_model=$GEMINI_MODEL \
  --tasks=ContactsAddContact \
  --n_task_combinations=1 \
  --use_virtual_env=True

Adding New Tasks

To add custom tasks to the project:

Create a task class in agent_env/task_evals/single/ (e.g., your_app.py):

from agent_env.task_evals.single import contacts

class YourCustomTask(contacts.ContactsAddContact):
    """Your custom task description."""
    
    complexity = 1.5  # Task complexity coefficient
    
    @property
    def goal(self) -> str:
        return "Your task goal description here"

Register the task in agent_env/registry.py:

# Import your task module
from agent_env.task_evals.single import your_app

# Add to _TASKS tuple
_TASKS = (
    contacts.ContactsAddContact,
    your_app.YourCustomTask,  # Add your task here
)

Run your custom task:

python run_virtual.py \
  --suite_family=agent_env \
  --agent_name=mobile_agent_v3 \
  --tasks=YourCustomTask \
  --use_virtual_env=True \
  # ... other parameters

Output Files

After execution, check:

# Logs with timestamps
ls logs/log_virtual_*.log

# Trajectories with timestamps
ls trajectories/traj_virtual_*/

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

📝 License

This project is licensed under the MIT License - see LICENSE file for details.

Note: The arXiv paper will be released soon. Please check back for the updated citation information.

🙏 Acknowledgments

Based on Mobile-Agent-v3
Virtual environment powered by Gemini AI

📧 Contact

For questions or issues, please open an issue on GitHub.

🔗 Related Projects

Mobile-Agent - Original Mobile-Agent implementation

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
agent_env		agent_env
config		config
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
QUICK_START.md		QUICK_START.md
README.md		README.md
README_CN.md		README_CN.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run_virtual.py		run_virtual.py
run_virtual.sh		run_virtual.sh
setup.sh		setup.sh
start_Android.png		start_Android.png
virtual_env_adapter.py		virtual_env_adapter.py
virtual_env_gemini3.py		virtual_env_gemini3.py
workflow.png		workflow.png

License

Futuresis/GUI-Trajectory-Virtual

Folders and files

Latest commit

History

Repository files navigation