uiauto-framework

A Go-based, generic, self-healing UI automation framework with multi-tier model routing, OmniParser visual grounding, and natural-language scenarios.

Why

Traditional UI tests break the moment a selector changes. uiauto-framework adds three layers between your scenario and the browser:

Light tier -- a fingerprint-cached executor that replays known-good selectors instantly.
Smart tier -- an LLM-powered discovery loop that re-resolves selectors when the cached pattern fails (structural match, then DOM-aware reasoning).
VLM tier -- visual scoring against an OmniParser-annotated screenshot when the DOM is ambiguous.

Architecture

flowchart TB
    cli["ui-agent CLI"] --> member["MemberAgent"]
    member --> light["Light: cached patterns"]
    member --> smart["Smart: LLM discovery"]
    member --> vlm["VLM: OmniParser visual"]
    member --> heal["SelfHealer fingerprint -> structural -> LLM -> VLM"]
    member --> browser["Browser (chromedp/CDP)"]
    member --> omni["OmniParser client"]
    plug["plugin seams"] -.- member
    omniSrv["OmniParser server"] -.->|HTTP| omni
    gateway["AI Gateway (OpenAI-compatible)"] -.->|HTTPS| smart

Plugin seams

The pkg/uiauto/plugin package exposes four extension points so the framework adapts to any web target:

ActionRegistry -- register custom action types beyond the built-ins.
ScenarioLoader -- parse scenarios from JSON, YAML, browser test specs, end-to-end test files, or any other source.
AuthProvider -- run target-specific auth (OAuth, API keys, password manager autofill via CDP, SSO redirects).
VisualVerifier -- pluggable visual scoring (OmniParser is the default; GPT-4V or any VLM can substitute).

See docs/plugin-guide.md for example implementations.

Quick start

# 1. Build the CLI
make build

# 2. Run example.com smoke
make smoke

The smoke target launches Chrome on :9222, runs the examples/example-com-smoke/scenario.json scenario, and writes annotated screenshots to ~/uiauto/tests/.

For a richer local demo that types into a form and waits for an async result:

make form-smoke

See examples/form-flow for the exact HTML page and scenario JSON.

Run artifacts and telemetry fields are documented in docs/observability.md.

Install from source

go install github.com/nfsarch33/uiauto-framework/cmd/ui-agent@latest

Tagged releases also publish checksummed ui-agent binaries through GitHub Releases.

Scenario format

[{
  "id": "smoke-001",
  "name": "Example.com smoke",
  "natural_language": ["Verify heading", "Click more info"],
  "selectors_used": ["h1", "a[href*=\"iana.org\"]"],
  "action_types": ["verify", "click"],
  "action_values": ["", ""]
}]

Full reference: docs/scenario-format.md.

LLM provider routing

The pkg/llm package supports multiple provider backends and routing strategies:

Providers (all implement llm.Provider):

Client -- OpenAI-compatible APIs (GPT-4o, Qwen, any /v1/chat/completions endpoint)
OllamaClient -- Ollama native API with think parameter support
BedrockClient -- Anthropic Bedrock Messages API via gateway proxy
ClaudeCLIClient -- Claude Code CLI headless mode

Routers:

TieredRouter -- ordered failover with per-provider health tracking and cooldown
SemanticRouter -- task-type-aware routing (discovery, pattern replay, visual, synthesis, evaluation)
MQRouter -- MQ-style fair-share scheduler with per-user FIFO queues, weighted upstream selection, and background health checks

MQ Router configuration (YAML)

max_queue_depth: 10
max_concurrency: 2
request_timeout: "5m"
nodes:
  - name: local-vllm
    url: http://127.0.0.1:8001
    tier: agent
    weight: 4
    models: ["qwen3.5-27b"]
  - name: openai-gpt4
    url: https://api.openai.com/v1
    api_key_env: OPENAI_API_KEY
    tier: powerful
    weight: 3
    models: ["gpt-4o"]
  - name: bedrock-claude
    url: http://localhost:8767/bedrock
    api_key_env: BEDROCK_API_KEY
    tier: powerful
    weight: 2
    models: ["anthropic.claude-sonnet-4-20250514-v1:0"]
health_check:
  interval: 15s
  timeout: 5s
  path: /health
  unhealthy_threshold: 3
  healthy_threshold: 1

cfg, _ := llm.LoadMQConfig("llm-config.yaml")
router, _ := llm.NewMQRouter(*cfg, slog.Default())
defer router.Close()

ctx := llm.WithUserID(ctx, "user-123")
resp, err := router.Complete(ctx, llm.CompletionRequest{...})

Repository layout

pkg/
  uiauto/      # core framework: agents, tiers, healer, browser, plugin seams
  llm/         # LLM providers, routers (tiered, semantic, MQ fair-share)
  evolver/     # capability mutation + auto-promotion
  domheal/     # DOM-level drift detection helpers
cmd/
  ui-agent/    # CLI binary for running scenarios
examples/
  example-com-smoke/   # public smoke scenario
docs/
  architecture.md
  observability.md
  plugin-guide.md
  scenario-format.md

Requirements

Go 1.24+
Chrome / Chromium (for chromedp). For visible demos, launch with --remote-debugging-port=9222.
Optional: an OmniParser V2 server for visual annotations and verifications.
Optional: any OpenAI-compatible LLM endpoint for the Smart tier; use MQRouter for multi-node routing with fair-share scheduling.

Quality gates

make lint
make test
make test-integration
make ossready

make test is the fast, short-mode unit suite. make test-integration starts the Docker Compose stack for browser, Postgres, and OmniParser-compatible integration coverage.

Roadmap

Stabilize the public Go API before v1.0.0.
SSE streaming passthrough for the MQ router (first-token latency parity).
Prometheus metrics endpoint for MQ router queue depth and upstream health.
Expand visual verification adapters beyond the default OmniParser client.
Add more browser-backend adapters behind the existing Browser interface.
Publish curated example suites for forms, dashboards, iframes, and visual drift recovery.

License

Apache-2.0. See LICENSE.

Project status

Early-access reference implementation. APIs may change before the first tagged release. Issues and PRs welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.cursor/rules		.cursor/rules
.github		.github
cmd/ui-agent		cmd/ui-agent
docs		docs
examples		examples
internal/doctor		internal/doctor
pkg		pkg
scripts		scripts
test/fixtures/omniparser-stub/mappings		test/fixtures/omniparser-stub/mappings
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.integration.yml		docker-compose.integration.yml
go.mod		go.mod
go.sum		go.sum
llms.txt		llms.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

uiauto-framework

Why

Architecture

Plugin seams

Quick start

Install from source

Scenario format

LLM provider routing

MQ Router configuration (YAML)

Repository layout

Requirements

Quality gates

Roadmap

License

Project status

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

uiauto-framework

Why

Architecture

Plugin seams

Quick start

Install from source

Scenario format

LLM provider routing

MQ Router configuration (YAML)

Repository layout

Requirements

Quality gates

Roadmap

License

Project status

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages