Skip to content

wannaphong/android-hostai

Repository files navigation

HostAI - Android LLM API Server

An Android application that uses LiteRT-LM to host an OpenAI-compatible API server, allowing you to run LLM models on your phone as a web service.

Warning: It's still alpha!

Features

  • πŸš€ OpenAI-compatible API endpoints
  • 🎨 Multimodal support - Send images and audio in chat messages (OpenAI format)
  • πŸ“± Native Android app with Material Design UI
  • πŸ”„ Foreground service for reliable server operation
  • 🌐 Local network access via WiFi
  • πŸ”Œ Compatible with OpenAI client libraries
  • ⚑ Optimized for ARM-based Android devices using LiteRT with GPU acceleration

API Endpoints

The server implements the following OpenAI-compatible endpoints:

  • GET /v1/models - List available models
  • POST /v1/chat/completions - Chat completions (ChatGPT-style) with multimodal support
  • POST /v1/completions - Text completions
  • GET /health - Health check endpoint
  • GET / - Web interface with API documentation
  • GET /chat - Web-based chat UI (powered by AI-QL/chat-ui)

Building

Prerequisites

  • Android Studio (2022.3 or later)
  • Android SDK (API level 26+)
  • JDK 8 or higher

Note: With the LiteRT library integration, you no longer need to manually build llama.cpp or configure NDK/CMake.

Build Instructions

  1. Clone the repository:

    git clone https://github.com/wannaphong/android-hostai.git
    cd android-hostai
  2. Open the project in Android Studio

  3. Build the project:

    ./gradlew assembleDebug
  4. Install on device:

    ./gradlew installDebug

GitHub Actions Release Builds

The repository includes a GitHub Actions workflow that automatically builds APK and AAB (Android App Bundle) files when a release is published or when manually triggered.

Automated Builds

The workflow builds:

  • Debug APK: Always built for testing
  • Release APK: Unsigned or signed (based on keystore availability)
  • Release AAB: Unsigned or signed (based on keystore availability)

All artifacts are uploaded to the workflow run and attached to GitHub releases when triggered by a release event.

Setting Up Signed Releases

To build signed release artifacts, configure the following repository secrets in GitHub:

  • KEYSTORE_FILE: Base64-encoded keystore file (base64 -w 0 your-keystore.jks)
  • KEYSTORE_PASSWORD: Password for the keystore
  • KEY_ALIAS: Alias of the signing key
  • KEY_PASSWORD: Password for the signing key

If these secrets are not configured, the workflow will build unsigned release artifacts.

Manual Workflow Trigger

You can manually trigger the workflow from the Actions tab in GitHub to build artifacts without creating a release.

Usage

  1. Install and launch the app on your Android device
  2. Select a LiteRT model file (.litertlm) from your device storage (optional, for testing you can start without a model)
  3. Tap "Start Server" to begin the API server
  4. The server will start on port 8080 by default
  5. Use the displayed IP address to access the API from other devices on the same network

Using the Chat UI

The easiest way to interact with your model is through the built-in web chat interface:

  1. After starting the server, open a web browser on any device on the same network
  2. Navigate to http://<phone-ip>:8080/chat
  3. The chat interface will automatically connect to your local API
  4. Start chatting with your model!

The chat UI is powered by AI-QL/chat-ui and comes pre-configured to work with your local API endpoint. It supports:

  • Real-time streaming responses
  • Markdown rendering
  • Chat history management
  • Multimodal inputs (when using vision models)

Getting LiteRT Models

You'll need a LiteRT model file to use this app. You can:

LiteRT models (.litertlm) are optimized for mobile devices with GPU acceleration support.

Concurrent Request Handling

HostAI efficiently handles multiple concurrent requests. For detailed information, see CONCURRENT_REQUESTS.md.

Multimodal Support

HostAI now supports native multimodal inputs (images and audio) using LiteRT-LM 0.8.0's vision and audio backends. You can include images and audio in your chat messages following the OpenAI API format:

# Example with base64-encoded image
curl http://<phone-ip>:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-3n-model",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
            }
          }
        ]
      }
    ]
  }'

Requirements:

  • Use a multimodal model like Gemma-3N-E2B or Gemma-3N-E4B
  • Images must be base64 encoded (URLs not yet supported)
  • Vision processing uses GPU, audio processing uses CPU

See API_USAGE.md for detailed multimodal examples including audio inputs and Python code with base64 encoding.

Example API Call

curl http://<phone-ip>:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-model",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7
  }'

Using with OpenAI Python Client

from openai import OpenAI

client = OpenAI(
    base_url="http://<phone-ip>:8080/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="llama-model",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Architecture

  • MainActivity - User interface for controlling the server and selecting models
  • ApiServerService - Foreground service that runs the HTTP server
  • OpenAIApiServer - Javalin-based web server with OpenAI-compatible endpoints and SSE streaming support
  • LlamaModel - Model interface using LiteRT library for native LLM inference

Implementation

This app uses the LiteRT-LM library, which provides:

  • Native LLM inference optimized for Android/ARM devices
  • GPU acceleration support for faster inference on supported devices
  • CPU fallback for universal device compatibility
  • Efficient model loading and context management
  • Easy-to-use Kotlin API with synchronous and asynchronous inference

The library handles all native code compilation and optimization, so you don't need to manually configure NDK, CMake, or build native code yourself.

Requirements

  • Android 8.0 (API level 26) or higher
  • ARM64 or x86_64 processor (64-bit architectures)
  • Permissions: INTERNET, FOREGROUND_SERVICE, ACCESS_NETWORK_STATE, READ_EXTERNAL_STORAGE

License

Apache License 2.0 - See LICENSE file for details

Acknowledgments

  • LiteRT-LM - Language model runtime for edge devices
  • LiteRT - TensorFlow Lite runtime
  • Javalin - Simple and modern web framework for Java and Kotlin

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Privacy Policy

We doesn't collect any your privacy data in our aplication. The Application doesn't need any internet to working.

Open source

You can get the source code at https://github.com/wannaphong/android-hostai.

contact: wannaphong@yahoo.com