Clickweave is an open-source desktop automation platform that combines a visual workflow builder with AI-powered planning and execution. Build automations by adding nodes from a palette, describe what you want in natural language, record a walkthrough of the task you want to automate, or mix all three approaches. Clickweave acts as a computer use agent (CUA) — it can see your screen, click, type, and make decisions — while giving you full control over every step.
Keywords: desktop automation, UI automation, RPA, computer use agent, CUA, agentic workflow, visual workflow builder, AI automation, no-code automation, record and replay, macro recorder, workflow recording, desktop recording, programming by demonstration, Model Context Protocol, MCP
Status: Clickweave is in early development. Expect rapid changes, incomplete features, and breaking updates as core functionality is being built.
- Features
- How It Works
- Architecture
- Node Types
- Use Cases
- Getting Started
- Configuration
- Logs
- FAQ
- For AI Agents
- License
A node-based graph editor for designing automation workflows. Add nodes from a categorized palette, connect them with edges, and configure each step through a detail modal with setup, checks, trace, and run history tabs. Supports multi-select, rubber-band selection, undo/redo, and visual loop grouping.
Describe your automation goal in natural language and Clickweave generates a complete workflow graph. The planner:
- Converts intent into a sequence of typed nodes using available MCP tool schemas
- Generates deterministic Tool steps by default, with optional AiStep and AiTransform nodes
- Auto-repairs malformed LLM output with one-shot retry and error feedback
- Validates generated workflows against structural graph rules
- Supports lenient parsing — partial corruption doesn't discard the entire response
Modify existing workflows through conversation. The assistant panel lets you describe changes in natural language, and Clickweave generates patches that add, remove, or update nodes. Patches are staged for review before applying. The assistant validates patches against the workflow graph and retries with feedback when the result would be invalid.
Record a demonstration of the task you want to automate and Clickweave synthesizes it into a replayable workflow graph — like a macro recorder that produces an editable, AI-enhanced automation:
- OS-level event capture records mouse clicks, key presses, and app focus changes
- Action normalization deduplicates, merges, and classifies raw events into meaningful steps
- Draft synthesis converts the normalized action sequence into a typed workflow graph with nodes and edges
- Review panel lets you rename, delete, and annotate steps before applying the draft to the canvas
- CDP support — Electron and Chrome-family apps are detected automatically and can be recorded with element-level precision via Chrome DevTools Protocol
Workflows support branching and looping without scripting:
- If nodes evaluate conditions and take true/false paths
- Switch nodes match against multiple cases with a default fallback
- Loop nodes use do-while semantics — the body runs at least once before checking exit criteria
- EndLoop nodes jump back to their paired Loop node
- Node outputs are extracted into runtime variables that downstream nodes can reference in conditions
Clickweave sees and interacts with the desktop via native-devtools-mcp, communicating over the Model Context Protocol (MCP):
- Visual perception: screenshots, OCR text detection, image template matching
- Interaction: mouse clicks, keyboard input, scrolling, window management
- Smart resolution: LLM-assisted app-name and element-name resolution with caching and retry
Clickweave has two execution modes: Test and Run. In Test mode, the engine verifies each step after execution — it captures a screenshot, sends it to a VLM for description, then asks an LLM judge whether the step succeeded. Steps that pass continue automatically; on failure the UI presents Retry, Skip, or Abort options. Read-only nodes (FindText, FindImage, ListWindows, TakeScreenshot) are exempt from verification since they have no observable side-effect. Nodes inside loops are verified in aggregate when the loop exits. LLM decisions made during Test (element resolution, click disambiguation, app-name resolution) are saved to a decision cache. In Run mode, the cache replays those decisions deterministically — skipping redundant LLM calls and running without supervision for faster, unattended execution.
Any read-only Tool step (FindText, FindImage, ListWindows, TakeScreenshot) can be marked with a Verification role, turning it into an inline test assertion that runs during execution:
- Fail-fast: a failed verification verdict stops the workflow immediately
- Expected outcome:
TakeScreenshotverification nodes require a free-text expected outcome for VLM evaluation - Verdicts: pass, fail, or warn — displayed in a verdict bar with expandable per-node breakdowns and VLM reasoning
- Failure policy: each check can be set to
FailNode(hard stop) orWarnOnly(soft warning)
Every workflow execution is persisted with full traceability:
.clickweave/runs/<workflow_name>/<execution_dir>/<node_name>/
run.json # Execution metadata and status
events.jsonl # Newline-delimited trace events
artifacts/ # Screenshots, OCR results, template matches
verdict.json # Check evaluation results
Browse past runs in the UI, inspect trace events, and preview captured artifacts.
Three independently configurable LLM endpoints (set in Settings):
- Planner — generates and patches workflows; also used as the supervision judge in Test mode
- Agent — powers AI step execution during runs
- VLM (optional) — dedicated vision model for image analysis and check evaluation
All endpoints use the OpenAI-compatible /v1/chat/completions format. Works with local servers (LM Studio, vLLM, Ollama) or hosted providers (OpenRouter, OpenAI). No API key required for local endpoints.
For Electron and Chrome-family apps, Clickweave uses the built-in CDP tools in native-devtools-mcp to get element-level precision:
- Automatic detection: Electron apps are detected by framework directory checks; Chrome-family browsers are matched by bundle ID
- Single server: CDP tools (
cdp_connect,cdp_click,cdp_take_snapshot, etc.) run on the samenative-devtools-mcpserver — no separate process needed - Walkthrough integration: during recording, detected CDP apps trigger a selection modal for the user to choose the target page
Desktop actions run locally. LLM/VLM requests go only to the endpoints you configure (defaults to localhost). Built with Tauri v2, producing lightweight native apps for macOS, Windows, and Linux.
- Describe, build, or record — Type your automation goal in the planner, add nodes from the palette, or record a walkthrough of the task
- Review the graph — Inspect the generated workflow, tweak node parameters, add verification nodes
- Test — Run in Test mode with per-step supervision: after each step, a VLM + LLM judge verifies the screen and pauses on failures so you can Retry, Skip, or Abort. LLM decisions (element resolution, click disambiguation) are recorded to a decision cache.
- Run — Switch to Run mode for unattended execution. The decision cache replays recorded choices deterministically — no LLM calls for previously resolved decisions, no supervision pauses.
- Iterate — Use the assistant to patch workflows conversationally, or edit nodes directly
Clickweave is a Tauri v2 hybrid app with a Rust backend and a React frontend.
/
├── crates/ # Rust backend workspace
│ ├── clickweave-core/ # Workflow model, validation, runtime context, storage, tool mapping
│ ├── clickweave-engine/ # Workflow execution engine (graph walk, retries, AI steps, checks)
│ ├── clickweave-llm/ # LLM client, planning, patching, assistant, repair logic
│ └── clickweave-mcp/ # MCP JSON-RPC client (subprocess lifecycle, tool calls)
├── src-tauri/ # Tauri app shell & IPC commands
├── ui/ # React frontend (Vite + Tailwind CSS v4)
│ ├── src/
│ │ ├── components/ # Graph canvas, nodes, modals, assistant, verdict bar
│ │ ├── store/ # Zustand state (9 slices: project, execution, assistant, history, settings, log, verdict, walkthrough, UI)
│ │ └── hooks/ # React hooks (undo/redo, loop grouping, node/edge sync, workflow actions, executor events)
├── docs/ # Reference & conceptual documentation
└── assets/ # Static assets
| Layer | Technology |
|---|---|
| Framework | React 19 |
| Build | Vite 6 |
| Styling | Tailwind CSS v4 |
| Graph Editor | React Flow (@xyflow/react) |
| State | Zustand (slice composition) |
| Desktop Bridge | Tauri v2 (@tauri-apps/api) |
| Type Safety | Auto-generated TypeScript bindings via Specta |
| Tests | Vitest + Testing Library |
Planning: UI → Tauri command → spawn MCP for tool discovery → LLM call → parse + validate → Workflow back to UI
Execution: UI → Tauri command → WorkflowExecutor::run() → spawn MCP server → walk graph (deterministic tools, AI loops, control flow) → per-step supervision in Test mode (screenshot → VLM → LLM judge → Retry/Skip/Abort) → record/replay LLM decisions via decision cache → stream executor:// events to UI
| Category | Node | Description |
|---|---|---|
| AI | AiStep | Agentic LLM + tool loop with configurable tool access and max calls |
| Vision | TakeScreenshot | Capture screen, window, or region with optional OCR |
| Vision | FindText | OCR-based text search (contains or exact match) |
| Vision | FindImage | Template matching with threshold and max results |
| Input | Click | Mouse click at coordinates or by target text (LLM-resolved) |
| Input | Hover | Move mouse to position with optional dwell time |
| Input | TypeText | Keyboard text input |
| Input | PressKey | Key press with modifiers (shift, control, option, command) |
| Input | Scroll | Scroll at position with delta |
| Window | ListWindows | Enumerate visible windows |
| Window | FocusWindow | Bring window to front by app name, window ID, or PID |
| Control Flow | If | Conditional branch (evaluates runtime variables) |
| Control Flow | Switch | Multi-way branch with named cases and default |
| Control Flow | Loop | Do-while loop with max iteration bound |
| Control Flow | EndLoop | Jump back to paired Loop node |
| AppDebugKit | McpToolCall | Generic invocation of any MCP tool by name |
| AppDebugKit | AppDebugKitOp | Invoke an AppDebugKit operation on a connected app |
Each node supports:
- Retries (0–10) with automatic re-execution and cache eviction on failure
- Timeout (ms) for bounded execution
- Settle delay (ms) for waiting after execution
- Trace level (Off / Minimal / Full) for controlling artifact capture
- Expected outcome — human-readable description for VLM verification
- Enabled toggle — skip nodes without removing them
- Record & Replay — Demonstrate a task once by recording your clicks and keystrokes, then replay it reliably as an editable workflow
- Desktop RPA — Automate repetitive back-office workflows across multiple desktop applications
- QA & Regression Testing — Build smoke and regression test flows with screenshots, OCR verification, and run traces for debugging
- AI-Assisted Automation — Let AI plan workflows from natural language, then keep deterministic execution where precision matters
- Visual Verification — Attach checks to nodes and let a VLM verify expected outcomes from screenshots
- Local-First Automation — Run against local LLM/VLM endpoints for privacy-sensitive workflows
- Cross-App Workflows — Chain actions across multiple desktop applications with window management and conditional branching
- Rust >= 1.85 — Install Rust
- Node.js (LTS) — Install Node.js
- Tauri CLI:
cargo install tauri-cli --locked
- OS dependencies:
- macOS: Xcode Command Line Tools (
xcode-select --install) - Windows: Visual Studio C++ Build Tools + WebView2 Runtime
- Linux (Ubuntu/Debian):
sudo apt-get update sudo apt-get install libwebkit2gtk-4.1-dev build-essential curl wget file libxdo-dev libssl-dev libayatana-appindicator3-dev librsvg2-dev
- macOS: Xcode Command Line Tools (
# Clone the repository
git clone https://github.com/sh3ll3x3c/clickweave.git
cd clickweave
# Install frontend dependencies
npm install --prefix ui
# Run in development mode
cargo tauri dev
# Build for production
cargo tauri buildOutput bundles will be in target/release/bundle/.
# Rust tests
cargo test
# Frontend tests
npm test --prefix uiConfigure in Settings within the app. Each endpoint takes:
- Base URL (default:
http://localhost:1234/v1) - Model name (default:
local) - API key (optional, empty for local endpoints)
The MCP command can be set to:
"npx"(default) — runsnpx -y native-devtools-mcp- A custom binary path for self-hosted MCP servers
At workflow startup, Clickweave queries the inference provider for model metadata (context length, architecture, quantization).
| Provider | Context Length Field | Endpoint |
|---|---|---|
| LM Studio | max_context_length, loaded_context_length |
/api/v0/models |
| vLLM | max_model_len |
/v1/models |
| OpenRouter | context_length |
/v1/models |
| Ollama | Not supported yet | — |
| OpenAI | Not available via API | — |
Toggle in Settings:
- Allow AI Transforms (default: on) — enables AiTransform in planner output
- Allow Agent Steps (default: off) — enables full agentic loops with tool access
JSON-formatted structured logs.
| Platform | Location |
|---|---|
| macOS | ~/Library/Logs/Clickweave/ |
| Windows / Linux | ./logs/ (relative to working directory) |
Log files: clickweave.YYYY-MM-DD.txt. Console level defaults to info (override with RUST_LOG). File layer captures at info with debug for clickweave_* crates. Full LLM request/response bodies are logged at trace level — set RUST_LOG=clickweave_llm=trace to include them.
Clickweave is an open-source desktop automation platform that acts as a computer use agent. It combines a visual node-based workflow builder, AI-powered planning, and record-and-replay workflow recording for automating UI tasks on macOS, Windows, and Linux.
Clickweave drives desktop interactions through the Model Context Protocol (MCP) by spawning native-devtools-mcp as a subprocess. This provides screenshots, OCR, mouse/keyboard control, and window management — all without injecting code into target applications.
No. You can build workflows visually by adding nodes from a categorized palette, use natural-language prompts for AI-assisted planning and patching, record a walkthrough of the task to generate a workflow automatically, or combine all three approaches.
A computer use agent is software that can see and interact with a computer's graphical interface — taking screenshots, reading text via OCR, clicking buttons, and typing — to accomplish tasks autonomously or semi-autonomously. Clickweave is a CUA that gives you a visual editor to control what the agent does.
Any provider with an OpenAI-compatible /v1/chat/completions endpoint: LM Studio, vLLM, Ollama (OpenAI-compatible mode), OpenRouter, OpenAI, and others. You can use different models for planning, execution, and vision tasks.
Yes, Clickweave is local-first. Desktop automation actions run entirely on your machine. LLM/VLM requests are sent only to the endpoints you configure — which default to localhost. No telemetry, no cloud dependency.
Traditional RPA tools rely on selectors, DOM scraping, or accessibility APIs tied to specific applications. Clickweave uses visual perception (screenshots + OCR + template matching) and can automate any application with a visible UI. The AI planner means you can describe what you want rather than manually scripting every click.
Yes. Clickweave supports If, Switch, and Loop control-flow nodes. Node outputs are extracted into runtime variables that conditions and loops can reference, so workflows can branch and repeat without scripting.
Clickweave has two layers of verification. Per-step supervision (Test mode only) checks each step as it runs: a VLM describes the screen and an LLM judge decides pass/fail, pausing for user action on failure. Inline verification uses Verification-role nodes (any read-only tool step: FindText, FindImage, ListWindows, TakeScreenshot) as test assertions that produce pass/fail/warn verdicts during execution and fail-fast on failure.
Record a walkthrough by performing the task while Clickweave captures OS-level events (clicks, key presses, app focus). The recording is normalized — deduplicating and merging raw events — then synthesized into a typed workflow graph. A review panel lets you rename, delete, and annotate steps before applying the draft. The resulting workflow is fully editable and replayable, just like one built by hand or generated by AI.
The Model Context Protocol (MCP) is a standard for connecting AI models to external tools. Clickweave uses MCP to decouple its orchestration engine from specific automation backends — the same workflow graph can drive different MCP servers without changing the workflow definition.
This section helps AI agents navigate and understand the codebase.
Key Entry Points:
- Planner:
crates/clickweave-llm/src/planner/plan.rs—plan_workflow/plan_workflow_with_backend - Patcher:
crates/clickweave-llm/src/planner/patch.rs— workflow patch generation - Assistant:
crates/clickweave-llm/src/planner/assistant.rs— conversational assistant with patch validation retry - Execution Loop:
crates/clickweave-engine/src/executor/run_loop.rs— core graph walk - Supervision:
crates/clickweave-engine/src/executor/supervision.rs— per-step VLM + LLM verification pipeline - MCP Protocol:
crates/clickweave-mcp/src/protocol.rs— JSON-RPC implementation - Frontend State:
ui/src/store/useAppStore.ts— main Zustand store - Tauri Commands:
src-tauri/src/commands/— IPC bridge (run_workflow,stop_workflow,supervision_respond)
Conventions:
- Error handling: internal crates use
anyhow::Result; Tauri boundaries returnResult<_, String> - Async:
tokioruntime - Tracing:
tracingcrate for structured logging - Types: Specta + tauri-specta for auto-generated TypeScript bindings
Documentation: See docs/ for reference docs (code-coupled, agent-oriented) and conceptual docs (architecture mental models).
Distributed under the MIT License. See LICENSE for more information.
