An Android application that uses LiteRT-LM to host an OpenAI-compatible API server, allowing you to run LLM models on your phone as a web service.
Warning: It's still alpha!
- π OpenAI-compatible API endpoints
- π¨ Multimodal support - Send images and audio in chat messages (OpenAI format)
- π± Native Android app with Material Design UI
- π Foreground service for reliable server operation
- π Local network access via WiFi
- π Compatible with OpenAI client libraries
- β‘ Optimized for ARM-based Android devices using LiteRT with GPU acceleration
The server implements the following OpenAI-compatible endpoints:
GET /v1/models- List available modelsPOST /v1/chat/completions- Chat completions (ChatGPT-style) with multimodal supportPOST /v1/completions- Text completionsGET /health- Health check endpointGET /- Web interface with API documentationGET /chat- Web-based chat UI (powered by AI-QL/chat-ui)
- Android Studio (2022.3 or later)
- Android SDK (API level 26+)
- JDK 8 or higher
Note: With the LiteRT library integration, you no longer need to manually build llama.cpp or configure NDK/CMake.
-
Clone the repository:
git clone https://github.com/wannaphong/android-hostai.git cd android-hostai -
Open the project in Android Studio
-
Build the project:
./gradlew assembleDebug
-
Install on device:
./gradlew installDebug
The repository includes a GitHub Actions workflow that automatically builds APK and AAB (Android App Bundle) files when a release is published or when manually triggered.
The workflow builds:
- Debug APK: Always built for testing
- Release APK: Unsigned or signed (based on keystore availability)
- Release AAB: Unsigned or signed (based on keystore availability)
All artifacts are uploaded to the workflow run and attached to GitHub releases when triggered by a release event.
To build signed release artifacts, configure the following repository secrets in GitHub:
KEYSTORE_FILE: Base64-encoded keystore file (base64 -w 0 your-keystore.jks)KEYSTORE_PASSWORD: Password for the keystoreKEY_ALIAS: Alias of the signing keyKEY_PASSWORD: Password for the signing key
If these secrets are not configured, the workflow will build unsigned release artifacts.
You can manually trigger the workflow from the Actions tab in GitHub to build artifacts without creating a release.
- Install and launch the app on your Android device
- Select a LiteRT model file (.litertlm) from your device storage (optional, for testing you can start without a model)
- Tap "Start Server" to begin the API server
- The server will start on port 8080 by default
- Use the displayed IP address to access the API from other devices on the same network
The easiest way to interact with your model is through the built-in web chat interface:
- After starting the server, open a web browser on any device on the same network
- Navigate to
http://<phone-ip>:8080/chat - The chat interface will automatically connect to your local API
- Start chatting with your model!
The chat UI is powered by AI-QL/chat-ui and comes pre-configured to work with your local API endpoint. It supports:
- Real-time streaming responses
- Markdown rendering
- Chat history management
- Multimodal inputs (when using vision models)
You'll need a LiteRT model file to use this app. You can:
- Download pre-converted LiteRT models from HuggingFace LiteRT Community
- Popular LiteRT models include:
- Gemma3-1B-IT (557 MB, 4-bit quantized)
- Phi-4-mini (3.7 GB, 8-bit quantized)
- Qwen2.5-1.5B (1.5 GB, 8-bit quantized)
LiteRT models (.litertlm) are optimized for mobile devices with GPU acceleration support.
HostAI efficiently handles multiple concurrent requests. For detailed information, see CONCURRENT_REQUESTS.md.
HostAI now supports native multimodal inputs (images and audio) using LiteRT-LM 0.8.0's vision and audio backends. You can include images and audio in your chat messages following the OpenAI API format:
# Example with base64-encoded image
curl http://<phone-ip>:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-3n-model",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
}
}
]
}
]
}'Requirements:
- Use a multimodal model like Gemma-3N-E2B or Gemma-3N-E4B
- Images must be base64 encoded (URLs not yet supported)
- Vision processing uses GPU, audio processing uses CPU
See API_USAGE.md for detailed multimodal examples including audio inputs and Python code with base64 encoding.
curl http://<phone-ip>:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-model",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7
}'from openai import OpenAI
client = OpenAI(
base_url="http://<phone-ip>:8080/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="llama-model",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)- MainActivity - User interface for controlling the server and selecting models
- ApiServerService - Foreground service that runs the HTTP server
- OpenAIApiServer - Javalin-based web server with OpenAI-compatible endpoints and SSE streaming support
- LlamaModel - Model interface using LiteRT library for native LLM inference
This app uses the LiteRT-LM library, which provides:
- Native LLM inference optimized for Android/ARM devices
- GPU acceleration support for faster inference on supported devices
- CPU fallback for universal device compatibility
- Efficient model loading and context management
- Easy-to-use Kotlin API with synchronous and asynchronous inference
The library handles all native code compilation and optimization, so you don't need to manually configure NDK, CMake, or build native code yourself.
- Android 8.0 (API level 26) or higher
- ARM64 or x86_64 processor (64-bit architectures)
- Permissions: INTERNET, FOREGROUND_SERVICE, ACCESS_NETWORK_STATE, READ_EXTERNAL_STORAGE
Apache License 2.0 - See LICENSE file for details
- LiteRT-LM - Language model runtime for edge devices
- LiteRT - TensorFlow Lite runtime
- Javalin - Simple and modern web framework for Java and Kotlin
Contributions are welcome! Please feel free to submit a Pull Request.
We doesn't collect any your privacy data in our aplication. The Application doesn't need any internet to working.
You can get the source code at https://github.com/wannaphong/android-hostai.
contact: wannaphong@yahoo.com