A human-in-the-loop Flutter macOS desktop app (NativePlatformKit-testharness/) that boots an
Android emulator, builds & installs the NativePlatformKit-TestLabs-Android-Playground app, then
guides a human through every one of its ~78 UI screens, capturing a screenshot of each into
NativePlatformKit-testharness/screenshots/.
This roadmap breaks the work into Epics → Stories → Tasks, with a one-to-one mapping between
Epics and Milestones (Epic Ek ⇔ Milestone Mk).
Status legend: ⬜ not started · 🟡 in progress · ✅ done · ⏸️ blocked/deferred · 🟢 verified on-device
Captured during the design interview. These are fixed unless explicitly revisited.
| # | Area | Decision |
|---|---|---|
| D1 | Harness platform | Flutter macOS desktop app |
| D2 | Architecture | Provider + MVVM, feature-first layering |
| D3 | Native OS integration | Swift layer: MethodChannel (commands) + EventChannel (live stdout/stderr/progress/exit), per-process handles for cancel/kill |
| D4 | macOS build toolchain | CocoaPods (pod-based), build scripts pin chruby Ruby 3.4.4 (NOT Swift Package Manager) |
| D5 | Visual design | Native macOS look via macos_ui, follows system light/dark |
| D6 | Emulator image | API 34 · google_apis · arm64-v8a |
| D7 | AVD profile | Pixel 6 · 4 GB RAM · hardware GPU |
| D8 | SDK bootstrap | In-app: Environment screen installs cmdline-tools + system image + accepts licenses |
| D9 | App acquisition | Harness runs ./gradlew :app:assembleDebug → adb install -r (JDK 17) |
| D10 | Step-through | Guided manual + 1-click capture |
| D11 | Live view | Native emulator window + harness side panel |
| D12 | Flow catalog | Auto-enumerated from nav graphs + main-page index maps, then curated instructions |
| D13 | Capture granularity | One capture per destination fragment (~78); transient states ad hoc |
| D14 | Flow Runner layout | Three-pane: tree · detail · preview |
| D15 | App navigation | Guided wizard (Environment→Emulator→Build→Run) then free sidebar |
| D16 | Run persistence | Resumable + per-screen status/notes + exported Markdown + CSV report |
| D17 | Screenshot capture | adb exec-out screencap -p, full-screen PNG as-is |
| D18 | Output layout | screenshots/runs/<timestamp>/<category>/<id>.png + canonical latest/ + per-run manifest.json |
| D19 | Config/state store | Project-local .harness/ (config.json + resumable run-state) |
| D20 | Git policy | Commit everything incl. all timestamped runs and .harness/ |
| Epic | Milestone | Title | Status |
|---|---|---|---|
| E0 | M0 | Project Scaffold & App Shell | ✅ |
| E1 | M1 | Native Bridge (Swift) | ✅ |
| E2 | M2 | Environment Detection & SDK Bootstrap | 🟢 |
| E3 | M3 | Emulator Manager | 🟢 |
| E4 | M4 | Build & Install Pipeline | 🟢 |
| E5 | M5 | Flow Catalog | ✅ |
| E6 | M6 | Flow Runner (Guided Capture) | 🟢 |
| E7 | M7 | Gallery, Manifest & Reports | ✅ |
| E8 | M8 | Polish, Docs & Git Hygiene | ✅ |
ID scheme: E<epic> → S<epic>.<story> → T<epic>.<story>.<task>. Each item is tagged with a status marker from the legend at the top of this document.
Goal: A buildable, analyzable Flutter macOS app with the macos_ui shell, Provider wiring, theme, and the wizard-then-sidebar navigation containing placeholder screens for all six sections.
- [✅] T0.1.1
flutter createa macOS-enabled project atNativePlatformKit-testharness/(orgai.offside, no iOS/web/win/linux targets). - [✅] T0.1.2 Pin Flutter 3.44.2; add
pubspec.yamldeps:provider,macos_ui,path,path_provider,collection; dev:flutter_lints. (also addedintl) - [✅] T0.1.3 Add
analysis_options.yaml(strict lints) and confirmflutter pub get+flutter analyzeare clean. - [✅] T0.1.4 Verify a baseline
flutter build macossucceeds (exercises the chruby Ruby 3.4.4 / CocoaPods path — see S8.x).
- [✅] T0.2.1
MacosApproot withmacos_uitheme (light/dark following system), app accent + typography. - [✅] T0.2.2
MacosWindow+Sidebarwith six items: Environment · Emulator · Build & Install · Flow Runner · Gallery · Settings. - [✅] T0.2.3 First-run guided wizard mode (Environment→Emulator→Build→Run) that, once readiness is met, unlocks free sidebar nav.
- [✅] T0.2.4 Placeholder
ContentAreapage per section with title + "not yet implemented" state.
- [✅] T0.3.1 Establish folders:
lib/{app,core,features,data,services}per the spec. - [✅] T0.3.2 Root
MultiProvider; anAppState(ChangeNotifier) holding readiness + current section. - [✅] T0.3.3 MVVM base conventions doc-comment (View ⇄ ViewModel ⇄ Service) and a
Result<T>helper type.
Acceptance: app launches on macOS, shows the wizard then sidebar, navigates between six placeholder
screens; flutter analyze clean.
Goal: A robust Swift↔Dart bridge that can run arbitrary CLIs with live streaming output and cancellation, plus native file dialogs.
- [✅] T1.1.1 Swift
NpkBridge: registerMethodChannel("npk/commands")+EventChannel("npk/events")inMainFlutterWindow. - [✅] T1.1.2 Dart
NativeBridgeservice wrapping invoke + a broadcastStream<ProcessEvent>keyed by handle. - [✅] T1.1.3 Define payload contracts (
ProcessEvent{handle,type:stdout|stderr|exit,data,code}) and error mapping.
- [✅] T1.2.1 Swift
ProcessRunnerusingFoundation.Process/Pipe, streaming stdout/stderr lines over EventChannel. - [✅] T1.2.2 Per-process
handleregistry;process.cancel(handle)→ terminate (SIGTERM→SIGKILL). - [✅] T1.2.3 Environment injection (inherits process env + per-call overrides) and configurable working dir.
- [✅] T1.2.4 Dart-side
CommandRunnerwithrun()(await exit) andstream()(live) helpers + timeouts.
- [✅] T1.3.1
chooseDirectory/chooseFileviaNSOpenPanel. - [✅] T1.3.2 macOS entitlements review for subprocess + file access (app-sandbox disabled in Debug + Release per D4).
- [✅] T1.3.3 A "Developer Console" drawer in the UI that runs commands and tails the process stream.
Acceptance: from the UI, run adb version (or echo) and see live streamed output; cancel a
long sleep; pick a directory via native panel. (Code + unit tests done; interactive UI pass pending your manual verification.)
Goal: The Environment screen detects all tooling and can install the missing Android SDK pieces needed to create an emulator, with live progress.
- [✅] T2.1.1
EnvironmentService.detect()→ JDK, adb, emulator, sdkmanager, avdmanager, cmdline-tools, installed system image, AVDs, paths. - [✅] T2.1.2 Environment screen: per-tool status rows (found/version/path or missing) with an overall readiness summary.
- [🟢] T2.2.1 Install cmdline-tools (fetch + unzip into
$ANDROID_HOME/cmdline-tools/latest) via the bridge. (script run live: 137 MB installed) - [🟢] T2.2.2
sdkmanagerinstall ofsystem-images;android-34;google_apis;arm64-v8awith license acceptance, streaming progress. (image installed) - [✅] T2.2.3 Idempotent re-detection after each step; disable actions already satisfied.
- [✅] T2.2.4 Error/retry UX for network/license failures.
Acceptance: on a machine missing cmdline-tools + the image, the Environment screen installs both and flips to "ready" without leaving the app.
Goal: Create, boot, and stop the Pixel 6 / API 34 AVD; detect boot completion reliably.
- [🟢] T3.1.1
createAvd(Pixel 6 device, API 34 google_apis arm64 image, 4 GB RAM, hardware GPU);listAvds;deleteAvd. (create + list verified on the real SDK) - [✅] T3.1.2 Emulator Manager screen: AVD list, Create button, status (cold/booting/online).
- [🟢] T3.2.1
boot(windowed, hardware GPU); capture serial;stop. (boot +adb emu killverified onemulator-5554) - [🟢] T3.2.2 Readiness:
adbserial detection + pollsys.boot_completed+ dismiss/skip lockscreen; surface progress. (reachedsys.boot_completed=1) - [✅] T3.2.3 Cold-boot / wipe-data options; single-instance guard.
Acceptance: one click creates (if needed) and boots the emulator to home screen; status reflects online; stop terminates it.
Goal: Build the playground debug APK and install it to the booted emulator, with live logs.
- [🟢] T4.1.1
assembleDebugruns./gradlew :app:assembleDebugin the playground dir (JDK 17), streaming output. (playground builds → app-debug.apk 8.8 MB) - [🟢] T4.1.2 Locate the produced APK; surface success/failure. (
findApkverified)
- [🟢] T4.2.1
adb install -rthe APK to the target serial. (install → Success on emulator-5554) - [🟢] T4.2.2
launchthe main activity viaam start -n ai.offside.mobile.android.testlabs/...TestlabsMainActivity(applicationId ≠ namespace). (verified: topResumedActivity is the playground) - [✅] T4.2.3 Build & Install screen: Build → Install → Launch action with state + log pane. (built; interactive UI pass pending)
Acceptance: from a clean emulator, the harness builds, installs, and launches the playground to its home screen.
Goal: A curated catalog of all ~78 destinations with human navigation instructions, derived from the app's navigation structure.
- [✅] T5.1.1 Enumerate destinations + category grouping from the fragment inventory (package structure). (generator script → 82 screens, 19 categories)
- [✅] T5.1.2 Generate
assets/catalog.json({categories:[{id,title,screens:[{id,title,navInstructions,description,screenshot}]}]}).
- [✅] T5.2.1 Human
navInstructions[]+descriptionper screen (structural baseline; refine against the real home grid as needed). - [✅] T5.2.2 Dart models (
FlowCategory,FlowScreen) +CatalogServiceloader + validation (unique ids, non-empty instructions). - [✅] T5.2.3 Coverage check: catalog screen count (82) reconciles with the fragment inventory (~78) — test asserts ≥70.
Acceptance: catalog.json loads, validates, and lists every category/screen with usable
step-by-step instructions.
Goal: The core three-pane guided experience: step through screens, capture each, track status.
- [✅] T6.1.1 Left: category→screen tree with per-item status dots + overall progress bar.
- [✅] T6.1.2 Center: current screen detail (nav instructions, description, status, Capture / Skip / Note).
- [✅] T6.1.3 Right: preview of the captured image (
Image.file).
- [🟢] T6.2.1 Capture (
screencap→adb pull, binary-safe) → write PNG toruns/<id>/<category>/<id>.png+ mirror tolatest/. (verified: 1080×2400 PNG of the playground) - [✅] T6.2.2 Per-screen status (
captured/skipped/failed) + free-text notes; advance to next on capture. - [✅] T6.2.3 Resumable run-state persisted to
.harness/run-state.json(round-trip unit-tested). - [✅] T6.2.4 Run lifecycle: start/resume run, device/app metadata (serial, api, git SHA), completion via Finish.
- [🟢] T6.2.5 Auto Capture (extends D10): a toolbar button runs an automated UI-Automator sweep (Home → section → screen via
uiautomator dump+ tap-by-text + Back), auto-capturing each screen toruns/<id>/auto/+auto-manifest.json; cancellable. (navigation + capture mechanism verified on-device; parser unit-tested)
Acceptance: a full guided pass over all screens produces one PNG per captured screen, with statuses/notes saved and resumable across app restarts.
Goal: Persist run metadata and make captured runs browsable and reportable.
- [✅] T7.1.1 Write per-run
manifest.json(runId, device {avd,api}, app {package,gitSha,versionName}, harnessVersion, per-screen records, summary). (RunReporter, unit-tested) - [✅] T7.1.2 Capture the playground git SHA + versionName at run start. (
adb dumpsys package+git rev-parse)
- [✅] T7.2.1 Gallery screen: pick a run, grid of thumbnails by category, click to view full-size.
- [✅] T7.2.2 Export Markdown + CSV report alongside the manifest on run completion.
- [✅] T7.2.3 Show capture coverage (captured/skipped/failed) per run.
Acceptance: completing a run writes manifest + reports; the Gallery browses any run's screenshots with metadata.
Goal: Production-quality finish: reliable build, documentation, and committed outputs.
- [✅] T8.1.1
Makefileactivates chruby Ruby 3.4.4 beforeflutter build/run macos(run/build/test/analyze/verify/catalog targets). - [✅] T8.1.2 Documented the toolchain requirement (README); clean
flutter build macosverified repeatedly.
- [✅] T8.2.1
README.md(prereqs, first-run wizard, capture pass, output layout, make targets). - [✅] T8.2.2 Architecture notes (channel API, services, schemas) in README +
docs/adding-a-screen.mdrecipe.
- [✅] T8.3.1
.gitignoreignores.dart_tool//build//Pods/ephemeral; per D20screenshots/+.harness/are committed (verified not-ignored). - [✅] T8.3.2 No secrets/keystores tracked; repo-root
.gitignorereconciled.
- [🟢] T8.4.1 Pipeline verified end-to-end on-device: SDK bootstrap → AVD boot → build → install → launch → capture (full GUI guided pass pending user verification).
- [✅] T8.4.2 Empty/error states handled (no emulator, build/install/launch failure, capture failure surface messages).
Acceptance: a fresh checkout can, with documented steps, run the full pipeline and produce a committed run with screenshots, manifest, and reports.
flutter analyzeclean; app builds & runs on macOS.- Every long-running op streams live output and is cancellable.
- All decisions D1–D20 honored.
- Screenshots land in
NativePlatformKit-testharness/screenshots/per D18.
- CocoaPods/chruby build env (D4) — mitigated by S8.1; primary integration risk.
- Emulator image download size/network (E2) and arm64 boot performance (E3).
- Catalog drift if the playground's navigation changes — coverage check (T5.2.3) guards this.
- Repo size from committing all runs (D20) — accepted by decision.