HostAI - Android LLM API Server

An Android application that uses LiteRT-LM to host an OpenAI-compatible API server, allowing you to run LLM models on your phone as a web service.

Warning: It's still alpha!

Features

🚀 OpenAI-compatible API endpoints
🎨 Multimodal support - Send images and audio in chat messages (OpenAI format)
📱 Native Android app with Material Design UI
🔄 Foreground service for reliable server operation
🌐 Local network access via WiFi
🔌 Compatible with OpenAI client libraries
⚡ Optimized for ARM-based Android devices using LiteRT with GPU acceleration

API Endpoints

The server implements the following OpenAI-compatible endpoints:

GET /v1/models - List available models
POST /v1/chat/completions - Chat completions (ChatGPT-style) with multimodal support
POST /v1/completions - Text completions
GET /health - Health check endpoint
GET / - Web interface with API documentation
GET /chat - Web-based chat UI (powered by AI-QL/chat-ui)

Building

Prerequisites

Android Studio (2022.3 or later)
Android SDK (API level 26+)
JDK 8 or higher

Note: With the LiteRT library integration, you no longer need to manually build llama.cpp or configure NDK/CMake.

Build Instructions

Clone the repository:

git clone https://github.com/wannaphong/android-hostai.git
cd android-hostai

Open the project in Android Studio
Build the project:
```
./gradlew assembleDebug
```
Install on device:
```
./gradlew installDebug
```

GitHub Actions Release Builds

The repository includes a GitHub Actions workflow that automatically builds APK and AAB (Android App Bundle) files when a release is published or when manually triggered.

Automated Builds

The workflow builds:

Debug APK: Always built for testing
Release APK: Unsigned or signed (based on keystore availability)
Release AAB: Unsigned or signed (based on keystore availability)

All artifacts are uploaded to the workflow run and attached to GitHub releases when triggered by a release event.

Setting Up Signed Releases

To build signed release artifacts, configure the following repository secrets in GitHub:

KEYSTORE_FILE: Base64-encoded keystore file (base64 -w 0 your-keystore.jks)
KEYSTORE_PASSWORD: Password for the keystore
KEY_ALIAS: Alias of the signing key
KEY_PASSWORD: Password for the signing key

If these secrets are not configured, the workflow will build unsigned release artifacts.

Manual Workflow Trigger

You can manually trigger the workflow from the Actions tab in GitHub to build artifacts without creating a release.

Usage

Install and launch the app on your Android device
Select a LiteRT model file (.litertlm) from your device storage (optional, for testing you can start without a model)
Tap "Start Server" to begin the API server
The server will start on port 8080 by default
Use the displayed IP address to access the API from other devices on the same network

Using the Chat UI

The easiest way to interact with your model is through the built-in web chat interface:

After starting the server, open a web browser on any device on the same network
Navigate to http://<phone-ip>:8080/chat
The chat interface will automatically connect to your local API
Start chatting with your model!

The chat UI is powered by AI-QL/chat-ui and comes pre-configured to work with your local API endpoint. It supports:

Real-time streaming responses
Markdown rendering
Chat history management
Multimodal inputs (when using vision models)

Getting LiteRT Models

You'll need a LiteRT model file to use this app. You can:

Download pre-converted LiteRT models from HuggingFace LiteRT Community
Popular LiteRT models include:
- Gemma3-1B-IT (557 MB, 4-bit quantized)
- Phi-4-mini (3.7 GB, 8-bit quantized)
- Qwen2.5-1.5B (1.5 GB, 8-bit quantized)

LiteRT models (.litertlm) are optimized for mobile devices with GPU acceleration support.

Concurrent Request Handling

HostAI efficiently handles multiple concurrent requests. For detailed information, see CONCURRENT_REQUESTS.md.

Multimodal Support

HostAI now supports native multimodal inputs (images and audio) using LiteRT-LM 0.8.0's vision and audio backends. You can include images and audio in your chat messages following the OpenAI API format:

# Example with base64-encoded image
curl http://<phone-ip>:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-3n-model",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
            }
          }
        ]
      }
    ]
  }'

Requirements:

Use a multimodal model like Gemma-3N-E2B or Gemma-3N-E4B
Images must be base64 encoded (URLs not yet supported)
Vision processing uses GPU, audio processing uses CPU

See API_USAGE.md for detailed multimodal examples including audio inputs and Python code with base64 encoding.

Example API Call

curl http://<phone-ip>:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-model",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7
  }'

Using with OpenAI Python Client

from openai import OpenAI

client = OpenAI(
    base_url="http://<phone-ip>:8080/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="llama-model",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Architecture

MainActivity - User interface for controlling the server and selecting models
ApiServerService - Foreground service that runs the HTTP server
OpenAIApiServer - Javalin-based web server with OpenAI-compatible endpoints and SSE streaming support
LlamaModel - Model interface using LiteRT library for native LLM inference

Implementation

This app uses the LiteRT-LM library, which provides:

Native LLM inference optimized for Android/ARM devices
GPU acceleration support for faster inference on supported devices
CPU fallback for universal device compatibility
Efficient model loading and context management
Easy-to-use Kotlin API with synchronous and asynchronous inference

The library handles all native code compilation and optimization, so you don't need to manually configure NDK, CMake, or build native code yourself.

Requirements

Android 8.0 (API level 26) or higher
ARM64 or x86_64 processor (64-bit architectures)
Permissions: INTERNET, FOREGROUND_SERVICE, ACCESS_NETWORK_STATE, READ_EXTERNAL_STORAGE

License

Apache License 2.0 - See LICENSE file for details

Acknowledgments

LiteRT-LM - Language model runtime for edge devices
LiteRT - TensorFlow Lite runtime
Javalin - Simple and modern web framework for Java and Kotlin

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Privacy Policy

We doesn't collect any your privacy data in our aplication. The Application doesn't need any internet to working.

Open source

You can get the source code at https://github.com/wannaphong/android-hostai.

contact: wannaphong@yahoo.com

Name		Name	Last commit message	Last commit date
Latest commit History 433 Commits
.github/workflows		.github/workflows
app		app
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
.gitmodules.example		.gitmodules.example
API_USAGE.md		API_USAGE.md
CONCURRENT_REQUESTS.md		CONCURRENT_REQUESTS.md
LICENSE		LICENSE
README.md		README.md
_codeql_detected_source_root		_codeql_detected_source_root
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HostAI - Android LLM API Server

Features

API Endpoints

Building

Prerequisites

Build Instructions

GitHub Actions Release Builds

Automated Builds

Setting Up Signed Releases

Manual Workflow Trigger

Usage

Using the Chat UI

Getting LiteRT Models

Concurrent Request Handling

Multimodal Support

Example API Call

Using with OpenAI Python Client

Architecture

Implementation

Requirements

License

Acknowledgments

Contributing

Privacy Policy

Open source

About

Uh oh!

Releases 81

Packages

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

HostAI - Android LLM API Server

Features

API Endpoints

Building

Prerequisites

Build Instructions

GitHub Actions Release Builds

Automated Builds

Setting Up Signed Releases

Manual Workflow Trigger

Usage

Using the Chat UI

Getting LiteRT Models

Concurrent Request Handling

Multimodal Support

Example API Call

Using with OpenAI Python Client

Architecture

Implementation

Requirements

License

Acknowledgments

Contributing

Privacy Policy

Open source

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 81

Packages 0

Uh oh!

Contributors 2

Languages

Packages