AI that lives on your iPhone.
Good for notes, writing, questions, and image help. Private by design, offline after setup, and honest about what local AI does best.
What it's good for · Screenshots · Structure · Gemma 4 MLX Port · Model Bundle · Build
This repo contains the iOS app, landing page, and brand assets.
Yemma runs Gemma 4 E2B locally through a Swift-native MLX multimodal runtime. One model bundle handles both text and image flows. After a one-time download (~4.2 GB), the app works entirely offline — no cloud inference, no accounts, no telemetry.
- Quick rewrites and everyday writing help
- Personal notes and thinking out loud
- Everyday questions answered on-device
- Image explanations and visual help
- Offline use — planes, commutes, anywhere without signal
- Low-friction, no-account AI when you just need a hand
Yemma is not trying to replace frontier cloud models. Where you need deep reasoning, broad world knowledge, or giant workflows, cloud AI is still better. Where you want something local, private, and always available, Yemma is a good fit.
- Streaming chat with markdown rendering, image attachments, and conversation history
- Resumable background model bundle download (~4.2 GB first-time setup)
- On-device multimodal text and image inference via
MLXVLM - Local model-bundle validation before the app marks setup complete
- Configurable response style, temperature, and response limits
- Light / Dark / System appearance modes
- Built-in diagnostics, debug probes, and simulator mock mode
|
|
|
| Advanced controls Temperature, context window, flash attention, response length. |
Debug probes Markdown and renderer test scenarios. |
Diagnostics Event log, copyable logs, runtime metadata. |
ContentView.swift— root state machine (onboarding vs chat)LLMService.swift— MLX multimodal load, generation, streaming, and runtime lifecycleMLXModelSupport.swift— model directory validation and Gemma 4 asset contract checksModelDownloader.swift— single-repository download, resume, cleanup, and local validationConversationStore.swift— chat history persistenceYemmaPromptPlanner.swift— prompt shaping for the chat experienceGemma4SmokeAutomation.swift— smoke checks for the shipped model pathSettingsView.swift/AdvancedSettingsView.swift— runtime tuning, diagnostics, debug probesAppearance.swift— theme systemwebsite/— landing page and brand assets
Yemma originally ran Gemma 4 through two separate GGUF assets: a text model plus a standalone mmproj vision projector. The current MLX integration replaces that with one Swift-native multimodal bundle and one runtime container.
The important distinction is that MLX Swift already provided the general model-loading, tokenizer, and VLM infrastructure. The missing work was Gemma 4 support on the Swift side, plus Yemma-specific integration around download, validation, prompt shaping, and runtime lifecycle.
Validated upstream baseline:
mlx-swift-lmat3.31.3for Gemma 4 model, processor, and parity fixesmlx-swift-examplesat31b6cf6for app-side smoke validation and request-shaping patterns
How the current Yemma integration works:
Package.swiftpulls inMLX,MLXLMCommon,MLXVLM,Hub, andTokenizers, so the runtime stays inside Swift instead of bridging throughllama.cppor Objective-C++ vision code.ModelDownloaderpulls one MLX model repository, currentlymlx-community/gemma-4-e2b-it-4bit, using*.safetensors,*.json, and*.jinjapatterns instead of downloading a text GGUF and a secondmmprojfile. Yemma also recognizes legacy local bundles fromEZCon/gemma-4-E2B-it-4bit-mlx.ModelDirectoryValidatorproves the downloaded bundle is structurally usable by checking required metadata files, processor config, tokenizer files, weight shards, and safetensors index references before the app accepts setup as complete.Gemma4MLXSupportenforces the Gemma 4 multimodal asset contract in Swift by cross-checking processor and model values like soft-token budgets, patch size, and pooling kernel size. It also normalizes a known compatibility gap when a bundle is missing a top-levelpad_token_id.LLMServiceconverts each conversation turn into structuredChat.MessageandUserInputvalues with optional image URLs, then callscontext.processor.prepare(input:)so MLX performs the image and text preprocessing directly inside the same runtime path as inference.- The current implementation uses
VLMModelFactory.shared._load(...)to load the entire Gemma 4 VLM from one local directory, so text generation and image understanding live in oneModelContainerinstead of separate GGUF and projector runtimes. - Yemma still adds app-side stability logic around the MLX runtime, including prompt shaping, smoke checks, and output filtering for noisy hidden-channel and control-token responses.
What that buys us:
- no standalone
mmprojdownload - no Objective-C++ multimodal bridge
- one model bundle to download, validate, load, unload, and delete
- one Swift runtime path for both text-only and image-assisted turns
- Current default download source:
mlx-community/gemma-4-e2b-it-4bit - Legacy-compatible local bundle ID:
EZCon/gemma-4-E2B-it-4bit-mlx - Approximate first-download size:
4.2 GB - Downloaded file classes: safetensors weights, tokenizer/config JSON, processor config, and chat template files
- Runtime contract:
config.json,tokenizer.json,tokenizer_config.json,processor_config.jsonorpreprocessor_config.json, plus one or more readable.safetensorsweight files and any referenced safetensors index entries
After the bundle is downloaded, Yemma can load, unload, and run it entirely on device.
- Open
Yemma4.xcodeprojin a recent Xcode with Swift 6.1 support. - Run on a physical iPhone with iOS 17+ for real MLX inference.
- Use
./scripts/sim_run.shfor simulator testing with mocked replies. - Use
./scripts/device_startup_probe.shwhen you need a clean first-launch timing probe on device.
App Store Connect deployment via asc-cli.
MIT. See LICENSE.


