Skip to content

zhubert/long-running-agent

Repository files navigation

OpenClaw Architecture Analysis

Comprehensive deep-dive into how the viral "always-on personal AI assistant" actually works under the hood.

OpenClaw is a personal AI assistant that runs locally and bridges AI models to daily communication channels (WhatsApp, Telegram, Slack, Discord, iMessage, Signal, etc.) with always-on automation capabilities.

What makes it viral:

  • Real automation that works - Grocery shopping, school meal booking, air quality control, 3D printer management
  • "Couch potato dev" experience - Build entire websites via Telegram while watching Netflix
  • Self-improving agent - Can write its own skills (wine cellar skill built in minutes from CSV)
  • Multi-channel everywhere - Works on channels people already use
  • Browser automation - No APIs needed for many sites

πŸ“– Table of Contents


Integration Points

User Message β†’ Channel Plugin
    ↓
Gateway (WebSocket RPC server)
    ↓
Inbound Handler
    β”œβ”€ System Event Enqueue
    └─ Heartbeat Wake Request
         ↓
    Cron Scheduler Timer
    β”œβ”€ Finds due jobs
    └─ Enqueues job execution
         ↓
    Command Lane (Serializes)
    β”œβ”€ Runs Pi Agent (main or isolated)
    β”œβ”€ Tool execution (via Skills)
    └─ Result handling
         ↓
    Delivery System
    β”œβ”€ Channel-specific formatting
    └─ Outbound send (WhatsApp, Slack, etc)

Component Documentation

Click any component below to read the full analysis. Each file is a self-contained deep-dive with code references.

1️⃣ Cron Scheduler

15 KB Β· ⏰ Scheduling Β· πŸ“¦ In-Process

Custom in-process job scheduler (not system cron) with sophisticated job management.

Key Features:

  • 3 schedule types: at, every, cron
  • Exponential backoff: 30s β†’ 60min
  • Session isolation: main vs isolated
  • Smart wakeup timer (max 60s intervals)
  • Persistent state with crash recovery
  • Automatic session pruning

Architecture Highlights:

  • Timer-based wakeup with drift prevention
  • Main session jobs β†’ system events queue β†’ heartbeat
  • Isolated session jobs β†’ ephemeral agent turns β†’ delivery
  • Missed job catchup on startup
  • Stuck job detection (2h timeout)

2️⃣ Pi Agent Runtime

20 KB Β· πŸ€– AI Execution Β· πŸ”„ Multi-Model

Execution engine wrapping Anthropic's Pi agent framework.

Key Features:

  • Multi-model support (Anthropic, OpenAI, Google, local)
  • Auth profile rotation with cooldown
  • Multi-tier context recovery (compaction β†’ truncation β†’ graceful failure)
  • Real-time streaming with interrupts
  • Tool execution with sandbox support

Architecture Highlights:

  • Layered queueing (per-session + global lanes)
  • 3-tier context overflow recovery
  • Failover error classification (auth, rate_limit, billing, timeout)
  • Tool result truncation (30% max context share)
  • Active run tracking with message queueing

3️⃣ Skills System

17 KB Β· πŸ”Œ Plugins Β· πŸ“š 54+ Skills

Progressive disclosure system for modular skill packages.

Key Features:

  • 54+ built-in skills across 10 categories
  • 3-level loading: metadata β†’ SKILL.md β†’ resources
  • Plugin SDK with 100+ runtime methods
  • 14 lifecycle hooks for extensibility
  • Tool factory pattern for context-aware tools

Architecture Highlights:

  • Level 1: Metadata (~100 words, always loaded)
  • Level 2: SKILL.md body (1-5K words, on-demand)
  • Level 3: Scripts/references (loaded as needed)
  • skill-creator: Agent can write new skills
  • Memory slot system for exclusive plugins

4️⃣ Heartbeat System

17 KB Β· πŸ’“ Always-On Β· ⚑ Event-Driven

Always-on polling system keeping agents responsive.

Key Features:

  • Event-driven triggering (cron, exec, manual)
  • 250ms coalesce window
  • Active hours enforcement (timezone-aware)
  • Per-session event queues (max 20, deduplicated)
  • Multi-agent support (independent intervals)

Architecture Highlights:

  • System events queue (in-memory, per-session)
  • Wake handler with intelligent coalescing
  • Heartbeat runner with interval scheduling
  • Visibility controls (per-account/channel)
  • Duplicate suppression (24h window)

5️⃣ Gateway WebSocket

18 KB Β· 🌐 RPC Server Β· πŸ” Auth

Central control plane exposing 40+ RPC methods.

Key Features:

  • JSON-RPC-like protocol (req/res/event)
  • 40+ methods across 10 categories
  • Multiple auth mechanisms (token, password, Tailscale, device identity)
  • Node communication (macOS/iOS/Android)
  • Streaming support (intermediate responses)

Architecture Highlights:

  • Request/response/event frame types
  • Role-based + scope-based authorization
  • Local connection bypass (localhost skip auth)
  • Node registry for remote execution
  • Control UI with CSRF protection

6️⃣ Daemon Service

13 KB Β· βš™οΈ System Service Β· πŸ–₯️ Cross-Platform

Platform-specific daemon management for always-on operation.

Key Features:

  • macOS (launchd), Linux (systemd), Windows (schtasks)
  • Auto-start on boot/login
  • Automatic restart on crash (macOS, Linux)
  • Profile support for multiple instances
  • Comprehensive logging

Architecture Highlights:

  • macOS: RunAtLoad + KeepAlive in plist
  • Linux: WantedBy=default.target + user linger
  • Windows: ONLOGON trigger with LIMITED privileges
  • Legacy cleanup for smooth upgrades
  • CLI commands for lifecycle management

23 KB Β· πŸ”€ Concurrency Β· πŸ’Ύ State Management

Concurrency control via lanes + persistent conversation contexts via sessions.

Key Features:

  • Lane-per-session pattern prevents race conditions
  • File-based locking for concurrent processes
  • TTL-based caching (45s) reduces disk I/O
  • Atomic writes with stale lock eviction
  • Session auto-creation, auto-persist, auto-prune

Architecture Highlights:

  • Command lanes: main, cron, subagent, nested, session:{key}
  • Pump-and-drain pattern with re-entrancy guard
  • Session store with lock + cache + maintenance
  • Main vs isolated vs per-peer vs per-channel-peer sessions
  • Identity linking across channels

Key Architectural Insights

1. Progressive Disclosure

Skills use 3-level loading to keep context lean:

  • Metadata (always) β†’ SKILL.md (on trigger) β†’ Resources (as needed)

2. Lane-based Concurrency

Per-session lanes serialize operations while allowing parallelism across sessions.

3. Event-Driven Responsiveness

System events queue + heartbeat polling enables proactive behavior without blocking.

4. Multi-Tier Recovery

Context overflow handled gracefully:

  • Compaction (summarize history)
  • Truncation (cap tool results)
  • Graceful failure (user-friendly message)

5. Platform-Native Integration

Uses native service managers for true always-on:

  • macOS: launchd LaunchAgent
  • Linux: systemd user service + linger
  • Windows: Task Scheduler ONLOGON

6. Auth Profile Rotation

Failover between API keys/OAuth profiles with cooldown prevents rate limits.

7. Session Isolation

Cron jobs can run in isolated ephemeral sessions, separate from main conversation.


Why It's Going Viral

1. Real Automation That Works

  • Grocery shopping (Tesco autopilot)
  • School meal booking (ParentPay automation)
  • Air purifier control (Winix)
  • 3D printer management (BambuLab)
  • Court booking (never miss a padel slot)

2. "Couch Potato Dev" Experience

Real testimonial: Rebuilt entire website via Telegram while watching Netflix

  • Notion β†’ Astro migration
  • 18 posts migrated
  • DNS to Cloudflare
  • "Never opened a laptop"

3. Self-Improving Agent

  • Wine cellar skill built in minutes from CSV (962 bottles)
  • Jira skill generated on the fly
  • Todoist skill created directly in chat

4. Browser Automation Without APIs

  • TradingView analysis (screenshots + technical analysis)
  • Tesco shopping (no API)
  • ParentPay (mouse coordinates for table clicks)

5. Phone Bridge Integration

Clawdia phone bridge: Near real-time phone calls with your agent via Vapi


File References

All analyses reference actual source files from the OpenClaw repository:

  • /src/cron/ - Cron scheduler (38 files)
  • /src/agents/pi-embedded-runner/ - Pi agent runtime (27 files)
  • /src/plugins/ & /skills/ - Skills platform (54+ skills)
  • /src/infra/ - Heartbeat & system events
  • /src/gateway/ - WebSocket server
  • /src/daemon/ - Service management
  • /src/process/ & /src/infra/session*.ts - Lanes & sessions

Performance Summary

Component Memory CPU Latency Throughput
Cron Scheduler O(jobs) Minimal (single timer) Timer precision Parallel job execution
Pi Agent ~50-100 MB Varies with model Streaming (<100ms) Limited by model API
Skills ~5MB cached Lazy loading <1ms metadata Unlimited (progressive)
Heartbeat O(agents) <1% idle 250ms coalesce Unlimited events
Gateway O(connections) Event-driven <10ms RPC Concurrent unlimited
Daemon ~50-100 MB <1% idle 1-3s startup N/A
Lanes/Sessions O(sessions) Event-driven <1ms enqueue Limited by lanes

Credits

Analysis created by exploring the OpenClaw repository structure and comprehensive code examination. All architectural insights derived from reading actual implementation code, not speculation.

Repository: https://github.com/openclaw/openclaw License: MIT Version Analyzed: 2026.2.9


Next Steps:

  • Read individual component files for deep dives
  • Cross-reference with actual source code
  • Explore channel integrations (Slack, Discord, WhatsApp, etc.)
  • Study security model and sandbox architecture
  • Review hook system for custom automations

About

how do long running agents (like OpenClaw) work?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors