English | 简体中文
A streamlined implementation of Mobile-Agent-v3 that runs in a virtual GUI environment using Gemini AI to simulate Android interfaces, eliminating the need for physical devices or emulators.
- Virtual Android Environment: Uses Gemini-3 to generate realistic Android UI screenshots
- No Device Required: Runs completely virtually without Android emulators or physical devices
- GUI-Owl Integration: Leverages GUI-Owl-7B model for intelligent agent actions
- Flexible Task Support: Supports both standard Android tasks and custom social app tasks
- Automatic Trajectory Extraction: Captures and exports interaction trajectories for analysis
- Python 3.11+
- Gemini API access (via OpenRouter or compatible API)
- GUI-Owl-7B model or compatible LLM for agent reasoning
- GPU server with vLLM for GUI-Owl model (optional, can use remote API)
Note: This project uses the Mobile-Agent-V3 architecture, which employs a multi-agent system (Planning, Decision, and Reflection agents) for intelligent task execution.
# Create Anaconda environment
conda create -n GUI-V python=3.11 -y
conda activate GUI-V
# Install dependencies
pip install -r requirements.txt# Clone the repository
git clone https://github.com/Futuresis/GUI-Trajectory-Virtual.git
cd GUI-Trajectory-Virtual
# Install dependencies
pip install -r requirements.txtCreate a .env file or set environment variables:
# Gemini API Configuration (for virtual environment)
export GEMINI_API_KEY="your-gemini-api-key"
export GEMINI_BASE_URL="your-gemini-api-url"
export GEMINI_MODEL="google/gemini-3-pro-image-preview"
# Agent LLM Configuration (GUI-Owl or compatible)
export AGENT_API_KEY="your-api-key"
export AGENT_BASE_URL="your-api-url"
export AGENT_MODEL="path/to/model"# Run default contact task
bash run_virtual.sh
# Run social app tasks
bash run_virtual.sh --tasks social --category message# Activate Anaconda environment first
conda activate GUI-V
# Run default contact task (requires .env file configured)
python run_virtual.py \
--suite_family=agent_env \
--agent_name=mobile_agent_v3 \
--model=%AGENT_MODEL% \
--api_key=%AGENT_API_KEY% \
--base_url=%AGENT_BASE_URL% \
--gemini_api_key=%GEMINI_API_KEY% \
--gemini_base_url=%GEMINI_BASE_URL% \
--gemini_model=%GEMINI_MODEL% \
--tasks=ContactsAddContact \
--use_virtual_env=True# Run with custom parameters (all platforms)
python run_virtual.py \
--suite_family=agent_env \
--agent_name=mobile_agent_v3 \
--model=$AGENT_MODEL \
--api_key=$AGENT_API_KEY \
--base_url=$AGENT_BASE_URL \
--gemini_api_key=$GEMINI_API_KEY \
--gemini_base_url=$GEMINI_BASE_URL \
--gemini_model=$GEMINI_MODEL \
--tasks=ContactsAddContact \
--use_virtual_env=TrueThis project has been updated to be fully compatible with Windows GBK encoding:
- All Unicode special characters in output have been replaced with ASCII equivalents
- Status messages use
[OK],[WARNING],[FAILED]instead of Unicode symbols - Chinese comments in code are preserved (they don't affect runtime)
- Use Anaconda environment (recommended)
- Run Python scripts directly instead of bash scripts
- Configure
.envfile with your API keys before running
GUI-Trajectory-Virtual/
├── run_virtual.py # Main execution script
├── run_virtual.sh # Bash wrapper with presets
├── virtual_env_adapter.py # Virtual environment adapter
├── virtual_env_gemini3.py # Gemini-based virtual environment
├── start_Android.png # Initial Android screenshot
├── agent_env/ # Agent framework
│ ├── agents/ # Agent implementations
│ ├── task_evals/ # Task evaluations
│ └── utils/ # Utility functions
├── requirements.txt # Python dependencies
├── Dockerfile # Container setup
├── logs/ # Execution logs
├── trajectories/ # Generated trajectories
└── outputs/ # Other results
The system supports 10 categories of social app interactions:
- Message: Forwarding, quoting, scheduled messages
- Contact: Tag editing, duplicate merging, smart recommendations
- Group: Member management, reorganization, customization
- Media: Photo/video sharing, voice messages, location-based meetup
- Call: Scheduled calls, video with screen share, group calls
- Privacy: Notification rules, mute management, privacy controls
- Emoji: Contextual reactions, batch reactions, custom emoji
- File: Document editing, collection, collaborative review
- Status: Scheduled posts, engagement chains, interactive polls
- Search: Context search, cross-chat retrieval, history export
| Variable | Description | Required |
|---|---|---|
GEMINI_API_KEY |
Gemini API key | Yes |
GEMINI_BASE_URL |
Gemini API endpoint URL | Yes |
GEMINI_MODEL |
Gemini model name | Yes |
AGENT_MODEL |
Agent LLM model path or name | Yes |
AGENT_BASE_URL |
Agent API endpoint URL | Yes |
AGENT_API_KEY |
Agent API key | Yes |
python run_virtual.py --help
Key arguments:
--suite_family Task suite family (default: agent_env)
--agent_name Agent type (default: mobile_agent_v3)
--model Agent LLM model name
--api_key Agent API key
--base_url Agent API base URL
--gemini_api_key Gemini API key
--gemini_base_url Gemini API base URL (default: https://openrouter.ai/api/v1)
--gemini_model Gemini model name (default: google/gemini-2.0-flash-exp:free)
--tasks Task name list (e.g., ContactsAddContact)
--n_task_combinations Number of task parameter combinations (default: 1)
--use_virtual_env Use virtual environment (default: True)
--initial_image_path Initial Android screenshot (default: start_Android.png)
--resolution Virtual screen resolution (default: 1080x2400)
--traj_output_path Trajectory output directory (default: traj_output)
--output_path Checkpoint output directory (default: results)
--task_random_seed Random seed for task parameter sampling
--fixed_task_seed Whether to use same task seed across combinationsdocker build -t gui-trajectory-virtual .docker run -it \
-e GEMINI_API_KEY="your-gemini-key" \
-e GEMINI_BASE_URL="your-gemini-url" \
-e GEMINI_MODEL="google/gemini-3-pro-image-preview" \
-e AGENT_MODEL="GUI-Owl-7B" \
-e AGENT_BASE_URL="http://host.docker.internal:4243/v1" \
-e AGENT_API_KEY="EMPTY" \
-v $(pwd)/logs:/app/logs \
-v $(pwd)/trajectories:/app/trajectories \
gui-trajectory-virtual \
bash run_virtual.shTrajectories are automatically saved in the trajectories/ directory with timestamped folders:
trajectories/
└── traj_virtual_2024-12-04_20-15-30/
└── traj.jsonl # Trajectory in JSONL format (one JSON per line)Each line in traj.jsonl contains a complete step with Manager-Operator-Reflector format:
{
"manager": {
"response": "...",
"thought": "Need to open contacts app",
"plan": "1. Tap contacts icon\n2. Add new contact",
"user_request": "Add a new contact named John Doe"
},
"operator": {
"response": "...",
"thought": "Tapping on contacts app icon",
"action": "{\"action_type\": \"tap\", \"coordinates\": [540, 1200]}",
"description": "Tap on Contacts app",
"user_request": "Add a new contact named John Doe"
},
"reflector": {
"response": "...",
"outcome": "Successfully opened contacts app",
"error_description": "",
"user_request": "Add a new contact named John Doe"
}
}Detailed execution logs are saved with timestamps in the logs/ directory:
logs/log_virtual_2024-12-04_20-15-30.log
The project currently includes one task: ContactsAddContact. Run it using:
# Using bash script (Linux/Mac)
bash run_virtual.sh
# Using Python directly
python run_virtual.py \
--suite_family=agent_env \
--agent_name=mobile_agent_v3 \
--model=$AGENT_MODEL \
--api_key=$AGENT_API_KEY \
--base_url=$AGENT_BASE_URL \
--gemini_api_key=$GEMINI_API_KEY \
--gemini_base_url=$GEMINI_BASE_URL \
--gemini_model=$GEMINI_MODEL \
--tasks=ContactsAddContact \
--n_task_combinations=1 \
--use_virtual_env=TrueTo add custom tasks to the project:
- Create a task class in
agent_env/task_evals/single/(e.g.,your_app.py):
from agent_env.task_evals.single import contacts
class YourCustomTask(contacts.ContactsAddContact):
"""Your custom task description."""
complexity = 1.5 # Task complexity coefficient
@property
def goal(self) -> str:
return "Your task goal description here"- Register the task in
agent_env/registry.py:
# Import your task module
from agent_env.task_evals.single import your_app
# Add to _TASKS tuple
_TASKS = (
contacts.ContactsAddContact,
your_app.YourCustomTask, # Add your task here
)- Run your custom task:
python run_virtual.py \
--suite_family=agent_env \
--agent_name=mobile_agent_v3 \
--tasks=YourCustomTask \
--use_virtual_env=True \
# ... other parametersAfter execution, check:
# Logs with timestamps
ls logs/log_virtual_*.log
# Trajectories with timestamps
ls trajectories/traj_virtual_*/Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License - see LICENSE file for details.
Note: The arXiv paper will be released soon. Please check back for the updated citation information.
- Based on Mobile-Agent-v3
- Virtual environment powered by Gemini AI
For questions or issues, please open an issue on GitHub.
- Mobile-Agent - Original Mobile-Agent implementation
