This project is an autonomous AI agent capable of understanding high-level goals and translating them into a sequence of actions on an Android device. It can see and interpret the device's screen, make decisions, and interact with UI elements to accomplish complex tasks without human intervention.
The agent operates on a continuous see-think-act cycle, powered by a local Large Language Model (LLM) running via Ollama.
- Decomposition: When given a high-level goal, the agent first uses the LLM to break down the goal into a series of smaller, verifiable sub-tasks.
- Observation: For each sub-task, the agent captures the device's current UI hierarchy as an XML file. This gives it "vision," allowing it to see all the buttons, text, and other elements on the screen.
- Decision: The agent sends the current UI layout and the sub-task description to the LLM, asking it to determine the most logical next action (e.g.,
tapon an element,swipedown, ortypetext into a field). - Execution: The agent executes the chosen action on the device using the Android Debug Bridge (ADB).
This loop repeats until all sub-tasks are completed and the main goal is achieved. The entire process is driven by local AI, ensuring privacy and full control over the agent's operations.
- Goal-Oriented: Operates based on natural language objectives.
- Vision-Capable: Analyzes the screen's UI hierarchy to make informed decisions.
- Local & Private: Runs entirely on your local machine using Ollama. No data ever leaves your computer.
- Modular Architecture: Built with a clean separation between device interaction, AI planning, and action execution, making it easy to extend and modify.
- Android Studio & ADB: You will need to have the Android SDK Platform-Tools installed and know how to connect your device via ADB (Android Debug Bridge).
- Python: Python 3.10 or higher is required.
- Ollama: Ollama is required to run the language model. The setup script will attempt to install it for you.
-
Connect Your Android Device:
- The agent connects to an Android device using an ADB TCP connection (IP address and port). This is standard for emulators and for physical devices using Wi-Fi debugging.
- For Emulators (Android Studio, Genymotion, etc.):
- Ensure your emulator is running.
- The agent will attempt to auto-detect the device from the
adb devicescommand. If no TCP device is listed, it will default to127.0.0.1:5555.
- For Physical Devices (via Wi-Fi):
- Enable Developer Options and Wireless Debugging on your device.
- Connect to the same Wi-Fi network as your computer.
- Use the IP address and port provided in the Wireless Debugging settings to connect:
adb connect YOUR_DEVICE_IP:PORT
- Verify Connection:
- Run
adb devicesin your terminal. You should see your device listed with an IP address (e.g.,192.168.1.5:5555 device).
- Run
-
Run the Setup Script:
- Open a command prompt and navigate to the project directory.
- Run the
setup.batscript to automatically install all dependencies. - The script will check for ADB, install the required Python packages, and download/install Ollama if it's not already present.
-
Pull the Ollama Model:
- The setup script will automatically pull the required
llama3.2model. If it fails, you can do it manually by running:ollama pull llama3.2
- The setup script will automatically pull the required
-
Start the Ollama Service:
- Before running the agent, you must start the Ollama service. You can do this by running:
ollama serve - Leave this service running in a separate terminal.
- Before running the agent, you must start the Ollama service. You can do this by running:
-
Run the Demo:
- Open a new terminal, navigate to the project directory, and run the demo with:
python demo.py
- Open a new terminal, navigate to the project directory, and run the demo with:
- You can change the goal for the agent by modifying the
goalvariable in thedemo.pyfile.