Hark is the reference OACP voice assistant. It discovers app capabilities at runtime, resolves voice commands to actions using a two-tier on-device pipeline, dispatches Android intents, and handles async results -- all without the user leaving the assistant.
OacpDiscoveryHandler.kt scans for exported .oacp ContentProviders via
PackageManager. For each provider it reads two paths:
/manifest-- the machine-readableoacp.jsoncapability file./context-- the LLM-readableOACP.mdsemantic context file.
OACP.md is validated for presence but not currently consumed by the LLM.
It is reserved for future BYOK (bring-your-own-key) cloud models that have
larger context windows.
All discovered manifests are parsed into AssistantAction objects. Each action
carries:
description,aliases,examples,keywordsdisambiguationHintsfor overlapping capabilitiesparameterswith types, constraints, and entity snapshots
The registry is the single source of truth for what the device can do.
The transcript is embedded using EmbeddingGemma and compared via cosine
similarity against pre-embedded action descriptions (built from each action's
description, aliases, examples, keywords, and parameter metadata).
Actions are ranked by semantic score. A floor of 0.30 rejects garbage matches, and a confidence gate at 0.35 filters ambiguous cases. Only the top-ranked action proceeds to slot filling.
The selected action's description and parameter definitions are sent to Qwen3. The model extracts parameter values (numbers, names, durations, enum choices) from the transcript.
Actions are dispatched as Android intents via android_intent_plus:
- Broadcast for background-safe actions (e.g., set alarm, toggle flashlight).
- Activity launch for foreground-required actions (e.g., open a playlist).
Every dispatch includes org.oacp.extra.REQUEST_ID for result correlation.
OacpResultReceiver.kt (native BroadcastReceiver) listens for
org.oacp.ACTION_RESULT broadcasts from target apps. Results flow back through:
OacpResultReceiver (Kotlin)
-> EventChannel (platform channel)
-> OacpResultService (Dart)
-> AssistantScreen (chat bubble + TTS)
The user sees the result in the conversation and hears it spoken aloud, all without leaving Hark.
- STT:
speech_to_textpackage, backed by Android-nativeSpeechRecognizer(cloud recognition by default). Tap-to-cancel and a 10-second silence timeout. - TTS:
flutter_ttsfor spoken confirmations and result readback.
EmbeddingGemma handles intent selection via vector similarity -- no token budget concerns for that stage. Qwen3 receives only the selected action's description and parameter schema (~400 tokens).
OACP.md content is not currently consumed by the on-device pipeline.
It is reserved for future BYOK cloud models with larger context windows.
Hark registers as a system-level digital assistant via Android's
VoiceInteractionService framework. When the user long-presses Home (or uses
the device's assistant gesture), Android launches Hark and auto-starts listening.
Android validates all of these before allowing an app to be selected as the default assistant:
| Component | File | Purpose |
|---|---|---|
HarkVoiceInteractionService |
Kotlin | Background service Android binds to. Must declare BIND_VOICE_INTERACTION permission. |
HarkSessionService |
Kotlin | Creates HarkSession instances when assistant is invoked. |
HarkSession |
Kotlin | onShow() launches MainActivity with EXTRA_LAUNCHED_FROM_ASSIST. |
HarkRecognitionService |
Kotlin | Stub RecognitionService — required by Android to qualify as assistant. Actual STT handled by Flutter's speech_to_text. |
voice_interaction_service.xml |
res/xml/ |
Metadata referencing sessionService, recognitionService, and supportsAssist="true". |
ACTION_ASSIST intent filter |
Manifest | On MainActivity — required for assistant role qualification. |
ACTION_VOICE_ASSIST intent filter |
Manifest | On MainActivity — used by some devices for voice-specific activation. |
On Android 10+ (API 29), Hark uses RoleManager to request ROLE_ASSISTANT
on first launch. If the role isn't held, a system dialog prompts the user.
A banner in the Flutter UI also links to android.settings.VOICE_INPUT_SETTINGS
for manual configuration.
When launched via the assistant gesture, Hark enters continuous listening mode: the mic auto-restarts after each command completes and TTS finishes speaking. Tapping the mic button exits continuous mode.
| Claude | ChatGPT | Google Assistant | |
|---|---|---|---|
| ASSIST Activity | AssistantOverlayActivity (dedicated overlay) |
AssistantProxyActivity → AssistantActivity |
System-integrated |
| VoiceInteractionService | ClaudeVoiceInteractionService |
AssistantVoiceInteractionService |
GsaVoiceInteractionService |
| RecognitionService | ClaudeRecognitionService |
None | GoogleRecognitionService |
| VOICE_ASSIST | Yes | No | N/A |
Claude and ChatGPT both use a dedicated overlay/proxy Activity (separate from their main chat UI) to show a lightweight assistant panel on top of the current app. Hark currently launches its full MainActivity — a future improvement would be a dedicated lightweight overlay Activity for a more assistant-like experience.