A Go-based, generic, self-healing UI automation framework with multi-tier model routing, OmniParser visual grounding, and natural-language scenarios.
Traditional UI tests break the moment a selector changes. uiauto-framework
adds three layers between your scenario and the browser:
- Light tier -- a fingerprint-cached executor that replays known-good selectors instantly.
- Smart tier -- an LLM-powered discovery loop that re-resolves selectors when the cached pattern fails (structural match, then DOM-aware reasoning).
- VLM tier -- visual scoring against an OmniParser-annotated screenshot when the DOM is ambiguous.
A SelfHealer orchestrates the tiers automatically. Scenarios are written in
plain English with parallel action_types (click | type | verify | wait | frame | evaluate | read), so non-engineers can author and review them.
flowchart TB
cli["ui-agent CLI"] --> member["MemberAgent"]
member --> light["Light: cached patterns"]
member --> smart["Smart: LLM discovery"]
member --> vlm["VLM: OmniParser visual"]
member --> heal["SelfHealer fingerprint -> structural -> LLM -> VLM"]
member --> browser["Browser (chromedp/CDP)"]
member --> omni["OmniParser client"]
plug["plugin seams"] -.- member
omniSrv["OmniParser server"] -.->|HTTP| omni
gateway["AI Gateway (OpenAI-compatible)"] -.->|HTTPS| smart
The pkg/uiauto/plugin package exposes four extension points so the framework
adapts to any web target:
ActionRegistry-- register custom action types beyond the built-ins.ScenarioLoader-- parse scenarios from JSON, YAML, browser test specs, end-to-end test files, or any other source.AuthProvider-- run target-specific auth (OAuth, API keys, password manager autofill via CDP, SSO redirects).VisualVerifier-- pluggable visual scoring (OmniParser is the default; GPT-4V or any VLM can substitute).
See docs/plugin-guide.md for example implementations.
# 1. Build the CLI
make build
# 2. Run example.com smoke
make smokeThe smoke target launches Chrome on :9222, runs the
examples/example-com-smoke/scenario.json
scenario, and writes annotated screenshots to ~/uiauto/tests/.
For a richer local demo that types into a form and waits for an async result:
make form-smokeSee examples/form-flow for the exact HTML page and scenario JSON.
Run artifacts and telemetry fields are documented in docs/observability.md.
go install github.com/nfsarch33/uiauto-framework/cmd/ui-agent@latestTagged releases also publish checksummed ui-agent binaries through GitHub
Releases.
[{
"id": "smoke-001",
"name": "Example.com smoke",
"natural_language": ["Verify heading", "Click more info"],
"selectors_used": ["h1", "a[href*=\"iana.org\"]"],
"action_types": ["verify", "click"],
"action_values": ["", ""]
}]Full reference: docs/scenario-format.md.
The pkg/llm package supports multiple provider backends and routing strategies:
Providers (all implement llm.Provider):
Client-- OpenAI-compatible APIs (GPT-4o, Qwen, any/v1/chat/completionsendpoint)OllamaClient-- Ollama native API withthinkparameter supportBedrockClient-- Anthropic Bedrock Messages API via gateway proxyClaudeCLIClient-- Claude Code CLI headless mode
Routers:
TieredRouter-- ordered failover with per-provider health tracking and cooldownSemanticRouter-- task-type-aware routing (discovery, pattern replay, visual, synthesis, evaluation)MQRouter-- MQ-style fair-share scheduler with per-user FIFO queues, weighted upstream selection, and background health checks
max_queue_depth: 10
max_concurrency: 2
request_timeout: "5m"
nodes:
- name: local-vllm
url: http://127.0.0.1:8001
tier: agent
weight: 4
models: ["qwen3.5-27b"]
- name: openai-gpt4
url: https://api.openai.com/v1
api_key_env: OPENAI_API_KEY
tier: powerful
weight: 3
models: ["gpt-4o"]
- name: bedrock-claude
url: http://localhost:8767/bedrock
api_key_env: BEDROCK_API_KEY
tier: powerful
weight: 2
models: ["anthropic.claude-sonnet-4-20250514-v1:0"]
health_check:
interval: 15s
timeout: 5s
path: /health
unhealthy_threshold: 3
healthy_threshold: 1cfg, _ := llm.LoadMQConfig("llm-config.yaml")
router, _ := llm.NewMQRouter(*cfg, slog.Default())
defer router.Close()
ctx := llm.WithUserID(ctx, "user-123")
resp, err := router.Complete(ctx, llm.CompletionRequest{...})pkg/
uiauto/ # core framework: agents, tiers, healer, browser, plugin seams
llm/ # LLM providers, routers (tiered, semantic, MQ fair-share)
evolver/ # capability mutation + auto-promotion
domheal/ # DOM-level drift detection helpers
cmd/
ui-agent/ # CLI binary for running scenarios
examples/
example-com-smoke/ # public smoke scenario
docs/
architecture.md
observability.md
plugin-guide.md
scenario-format.md
- Go 1.24+
- Chrome / Chromium (for chromedp). For visible demos, launch with
--remote-debugging-port=9222. - Optional: an OmniParser V2 server for visual annotations and verifications.
- Optional: any OpenAI-compatible LLM endpoint for the Smart tier; use
MQRouterfor multi-node routing with fair-share scheduling.
make lint
make test
make test-integration
make ossreadymake test is the fast, short-mode unit suite. make test-integration starts
the Docker Compose stack for browser, Postgres, and OmniParser-compatible
integration coverage.
- Stabilize the public Go API before
v1.0.0. - SSE streaming passthrough for the MQ router (first-token latency parity).
- Prometheus metrics endpoint for MQ router queue depth and upstream health.
- Expand visual verification adapters beyond the default OmniParser client.
- Add more browser-backend adapters behind the existing
Browserinterface. - Publish curated example suites for forms, dashboards, iframes, and visual drift recovery.
Apache-2.0. See LICENSE.
Early-access reference implementation. APIs may change before the first tagged release. Issues and PRs welcome.