Hark Architecture

Hark is the reference OACP voice assistant. It discovers app capabilities at runtime, resolves voice commands to actions using a two-tier on-device pipeline, dispatches Android intents, and handles async results -- all without the user leaving the assistant.

Discovery

OacpDiscoveryHandler.kt scans for exported .oacp ContentProviders via PackageManager. For each provider it reads two paths:

/manifest -- the machine-readable oacp.json capability file.
/context -- the LLM-readable OACP.md semantic context file.

OACP.md is validated for presence but not currently consumed by the LLM. It is reserved for future BYOK (bring-your-own-key) cloud models that have larger context windows.

Capability Registry

All discovered manifests are parsed into AssistantAction objects. Each action carries:

description, aliases, examples, keywords
disambiguationHints for overlapping capabilities
parameters with types, constraints, and entity snapshots

The registry is the single source of truth for what the device can do.

Two-Stage Resolution

Stage 1 -- Intent Selection (EmbeddingGemma 308M)

The transcript is embedded using EmbeddingGemma and compared via cosine similarity against pre-embedded action descriptions (built from each action's description, aliases, examples, keywords, and parameter metadata).

Actions are ranked by semantic score. A floor of 0.30 rejects garbage matches, and a confidence gate at 0.35 filters ambiguous cases. Only the top-ranked action proceeds to slot filling.

Stage 2 -- Slot Filling (Qwen3 0.5B)

The selected action's description and parameter definitions are sent to Qwen3. The model extracts parameter values (numbers, names, durations, enum choices) from the transcript.

Intent Dispatch

Actions are dispatched as Android intents via android_intent_plus:

Broadcast for background-safe actions (e.g., set alarm, toggle flashlight).
Activity launch for foreground-required actions (e.g., open a playlist).

Every dispatch includes org.oacp.extra.REQUEST_ID for result correlation.

Async Result Handling

OacpResultReceiver.kt (native BroadcastReceiver) listens for org.oacp.ACTION_RESULT broadcasts from target apps. Results flow back through:

OacpResultReceiver (Kotlin)
  -> EventChannel (platform channel)
    -> OacpResultService (Dart)
      -> AssistantScreen (chat bubble + TTS)

The user sees the result in the conversation and hears it spoken aloud, all without leaving Hark.

STT / TTS

STT: speech_to_text package, backed by Android-native SpeechRecognizer (cloud recognition by default). Tap-to-cancel and a 10-second silence timeout.
TTS: flutter_tts for spoken confirmations and result readback.

Context Budget

EmbeddingGemma handles intent selection via vector similarity -- no token budget concerns for that stage. Qwen3 receives only the selected action's description and parameter schema (~400 tokens).

OACP.md content is not currently consumed by the on-device pipeline. It is reserved for future BYOK cloud models with larger context windows.

Android Assistant Integration

Hark registers as a system-level digital assistant via Android's VoiceInteractionService framework. When the user long-presses Home (or uses the device's assistant gesture), Android launches Hark and auto-starts listening.

Required components

Android validates all of these before allowing an app to be selected as the default assistant:

Component	File	Purpose
`HarkVoiceInteractionService`	Kotlin	Background service Android binds to. Must declare `BIND_VOICE_INTERACTION` permission.
`HarkSessionService`	Kotlin	Creates `HarkSession` instances when assistant is invoked.
`HarkSession`	Kotlin	`onShow()` launches MainActivity with `EXTRA_LAUNCHED_FROM_ASSIST`.
`HarkRecognitionService`	Kotlin	Stub `RecognitionService` — required by Android to qualify as assistant. Actual STT handled by Flutter's `speech_to_text`.
`voice_interaction_service.xml`	`res/xml/`	Metadata referencing sessionService, recognitionService, and `supportsAssist="true"`.
`ACTION_ASSIST` intent filter	Manifest	On MainActivity — required for assistant role qualification.
`ACTION_VOICE_ASSIST` intent filter	Manifest	On MainActivity — used by some devices for voice-specific activation.

Role management

On Android 10+ (API 29), Hark uses RoleManager to request ROLE_ASSISTANT on first launch. If the role isn't held, a system dialog prompts the user. A banner in the Flutter UI also links to android.settings.VOICE_INPUT_SETTINGS for manual configuration.

Continuous listening

When launched via the assistant gesture, Hark enters continuous listening mode: the mic auto-restarts after each command completes and TTS finishes speaking. Tapping the mic button exits continuous mode.

How other assistants do it (reverse-engineered)

	Claude	ChatGPT	Google Assistant
ASSIST Activity	`AssistantOverlayActivity` (dedicated overlay)	`AssistantProxyActivity` → `AssistantActivity`	System-integrated
VoiceInteractionService	`ClaudeVoiceInteractionService`	`AssistantVoiceInteractionService`	`GsaVoiceInteractionService`
RecognitionService	`ClaudeRecognitionService`	None	`GoogleRecognitionService`
VOICE_ASSIST	Yes	No	N/A

Claude and ChatGPT both use a dedicated overlay/proxy Activity (separate from their main chat UI) to show a lightweight assistant panel on top of the current app. Hark currently launches its full MainActivity — a future improvement would be a dedicated lightweight overlay Activity for a more assistant-like experience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hark Architecture

Discovery

Capability Registry

Two-Stage Resolution

Stage 1 -- Intent Selection (EmbeddingGemma 308M)

Stage 2 -- Slot Filling (Qwen3 0.5B)

Intent Dispatch

Async Result Handling

STT / TTS

Context Budget

Android Assistant Integration

Required components

Role management

Continuous listening

How other assistants do it (reverse-engineered)

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Hark Architecture

Discovery

Capability Registry

Two-Stage Resolution

Stage 1 -- Intent Selection (EmbeddingGemma 308M)

Stage 2 -- Slot Filling (Qwen3 0.5B)

Intent Dispatch

Async Result Handling

STT / TTS

Context Budget

Android Assistant Integration

Required components

Role management

Continuous listening

How other assistants do it (reverse-engineered)