AgentShell

Local shell for AI agents to control a Windows PC.

Onboarding (agent)

On session start, call help --json to receive the full command schema. Every command returns {"ok": bool, "data": {...}, "error": str|null, "state_delta": {...}}.

Session flow

help --json           → receive full schema (do this once per session)
context get           → get current mandatory context (active window, last action, state, goal)
listener status       → verify listener and indexer are active before trusting context
screen active         → check active window title
screen elements       → get UI elements via Accessibility Tree (preferred over capture)
  if complete=false   → description via OCR returned automatically (0.3+)
  if description insufficient → screen capture --region active (last resort)
screen find           → find a specific element by text instead of parsing all elements
screen wait           → wait for an element to appear after triggering an action
<execute action>      → mouse click / keyboard type / app launch / etc.
context get           → verify state updated after action
index query --last 5  → retrieve recent activity history if needed

Perception priority

screen elements — structured UI tree, no vision model needed, fastest
screen find — when you know what you're looking for, skip full element list
OCR description — auto-attached when tree is incomplete, text-based, no token cost
screen capture --region active — active window only, use when OCR is insufficient
screen capture --region full — full desktop, last resort

Memory system

context get — mandatory context, always current (5 fields: active_window, last_action, result, state, session_goal)
index query — pull-based history, call only when you need past context
index logs — query internal error log (WARNING/ERROR level)
listener status — check if C# listener and indexer are active
Context updates automatically after every command via state_delta
Indexer runs in background, does not block commands

Command reference

Run help --json for full schema with parameter types and descriptions.

Running

Interactive REPL:

python main.py

Subprocess (single command):

python main.py audio volume --set 60

Installed (after pip install):

agentshell
agentshell audio volume --set 60

Environment

GROQ_API_KEY — enables memory indexing (optional, shell works without it)
Or place key in config/api_keys.json as {"groq_api_key": "..."}

Version history

v0.1 — Core foundation

REPL with auto-discovery of command modules
Executor with full command set: screen, mouse, keyboard, audio, window, app, files
Uniform JSON response schema with state_delta
Machine-readable docs via help --json
Memory system: Listener (C#) → Aggregator → Indexer (Groq llama-3.1-8b) → SQLite store
Mandatory context (5 fields, always flat)
Pull-based index queries
Accessibility Tree as primary perception layer, screenshot as fallback
Subprocess mode (agentshell <command>)

v0.2 — Consolidation and fixes

Fix: Program.cs — stopWatcher resource leak
Fix: indexer.py — context validated and merged before writing, prevents partial overwrites
Fix: ListenerClient.py — pipe read now uses overlapped I/O with 1s timeout, no more thread hangs
Fix: screen.py — element deduplication no longer drops elements at close positions
New: clipboard read / clipboard write commands
New: screen region — screenshot of arbitrary coordinates
New: Structured logging to SQLite via core/logger.py, queryable with index logs

v0.3 — Perception improvement

New: OCR as intermediate perception layer between Accessibility Tree and raw screenshot
When screen elements returns complete: false, a text description via OCR is automatically attached to the response — no extra command needed
Agent only requests raw screenshot when OCR description is insufficient
Eliminates vision model token cost in the vast majority of fallback cases

v0.4 — Observability and environment awareness

New: screen monitors — enumerate all connected monitors with bounds and resolution
Multi-monitor support in screen capture, screen region, and OCR — active window captured correctly regardless of which monitor it's on
New: listener status — check if C# listener is running via heartbeat file, includes indexer state
C# listener writes heartbeat every 5s to data/listener_heartbeat.json
New: Clipboard watch in C# listener — emits clipboard_change events when user copies content
Clipboard changes flow through Aggregator → Indexer → context, agent sees them via index query

v0.5 — Command depth

audio: audio devices — list input/output devices, audio device — switch default device, audio app — per-app volume and mute control
screen: screen find — locate element by text, returns position directly, screen text — extract all visible text in reading order, screen wait — wait for element to appear with timeout
window: window resize, window move, window snap (left/right/maximize/restore), window info — detailed window metadata including process and PID
response: removed null/empty fields from JSON output — cleaner signal for the agent

v0.6 — Gap coverage

mouse: mouse position — current cursor coordinates
files: files exists — check if path exists without listing directory, files info — file/directory metadata (size, dates, extension)
app: app focus — bring app to foreground by process name
screen: screen waitgone — wait for element to disappear, complements screen wait
fix: pyautogui.FAILSAFE = False — prevents cursor corner exception from killing the agent

v0.7 — App resolution

app: app launch rewritten — now searches Program Files, Program Files (x86), Windows registry, and system PATH before falling back to Start Menu
Depth-limited directory search (4 levels) via os.walk — faster than rglob
_APP_ALIASES now used as search accelerator, not hard requirement — unlisted apps resolve automatically
Start Menu fallback (Win → type → Enter) as last resort with note in response
app: _launch_via_start_menu now verifies the process actually started by comparing process list before and after — returns ok: false instead of false positive if launch failed

v0.8 — System awareness and notifications

New group system: system info, system cpu, system ram, system disk, system battery — full hardware and OS state queryable by the agent
New group network: network status, network ip, network connections — connectivity state and active connections, filterable by process
New group notify: notify send — Windows toast notifications with configurable duration

v0.9 — REPL improvements

New: Command history — up/down arrows navigate previous commands, persists across sessions in data/.shell_history
New: Tab autocompletion — completes groups and commands, double Tab shows all options
New: Verbose mode — --verbose on/off shows command timing and state delta after each response

v1.0 — Agent integration

New: AgentShellClient — Python SDK to connect any LLM to the framework without managing subprocess or JSON parsing manually
New: examples/ — reference integration scripts for Groq, OpenAI, and Anthropic showing the recommended agent loop pattern
Agent sends commands wrapped in ```shell blocks, shell returns JSON, agent continues — no framework-specific training needed

v1.1 — User memory

New group user: persistent key-value memory about the user, organized in fixed categories
Categories: preferences, environment, schedule, identity, projects, misc
Commands: user set, user get, user list, user delete
Stored in SQLite alongside session memory — persists across sessions
Agent writes to memory as it learns about the user, reads it to personalize behavior

v1.2 — Bug fixes

Fix: app focus — now collects all PIDs matching the process name before searching for a visible window, resolves multi-instance apps like Brave and Spotify
Fix: screen text/elements/find/wait/waitgone — _resolve_window now excludes the shell process itself when no --window is specified, avoids capturing the terminal instead of the target app
Fix: audio volume — fallback to IMMDeviceEnumerator when GetSpeakers().Activate() fails, resolves pycaw version compatibility issue

v1.2 — Routines

New group routine: routine set, routine get, routine list, routine delete, routine run
User describes a routine in natural language, agent translates to shell commands and saves them
routine run executes saved commands directly — deterministic, zero token cost, no model interpretation needed
Routines stored in SQLite under the routines user memory category, persist across sessions

v1.3 — Credentials store

New group credentials: credentials set, credentials get, credentials list, credentials delete
Values encrypted at rest using Fernet symmetric encryption (cryptography library)
Encryption key stored in data/.secret.key — excluded from git, auto-generated on first run
credentials list shows services and keys but never values
Agent can retrieve credentials to authenticate with external services without exposing secrets in conversation history

v1.4 — Overlay system (current)

New group overlay: overlay notify, overlay status, overlay confirm, overlay chat
overlay notify — floating notification, auto-dismisses after configurable duration
overlay status — persistent indicator at top of screen showing agent activity, cleared with overlay status (no message)
overlay confirm — blocking Yes/No dialog, returns {"confirmed": true/false}
overlay chat — floating chat window with message history, supports agent/user sender roles
All visual constants extracted to core/overlay_theme.py — developers replace this file to apply custom themes without touching core logic

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.vscode		.vscode
agentshell		agentshell
backend		backend
commands		commands
core		core
desktop		desktop
listener		listener
memory		memory
mnt/user-data/outputs/agentshell		mnt/user-data/outputs/agentshell
.gitignore		.gitignore
ClipboardWatcher.cs		ClipboardWatcher.cs
README.md		README.md
dev-server.py		dev-server.py
dev.py		dev.py
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentShell

Onboarding (agent)

Session flow

Perception priority

Memory system

Command reference

Running

Environment

Version history

v0.1 — Core foundation

v0.2 — Consolidation and fixes

v0.3 — Perception improvement

v0.4 — Observability and environment awareness

v0.5 — Command depth

v0.6 — Gap coverage

v0.7 — App resolution

v0.8 — System awareness and notifications

v0.9 — REPL improvements

v1.0 — Agent integration

v1.1 — User memory

v1.2 — Bug fixes

v1.2 — Routines

v1.3 — Credentials store

v1.4 — Overlay system (current)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentShell

Onboarding (agent)

Session flow

Perception priority

Memory system

Command reference

Running

Environment

Version history

v0.1 — Core foundation

v0.2 — Consolidation and fixes

v0.3 — Perception improvement

v0.4 — Observability and environment awareness

v0.5 — Command depth

v0.6 — Gap coverage

v0.7 — App resolution

v0.8 — System awareness and notifications

v0.9 — REPL improvements

v1.0 — Agent integration

v1.1 — User memory

v1.2 — Bug fixes

v1.2 — Routines

v1.3 — Credentials store

v1.4 — Overlay system (current)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages