The system exposes a set of operations that can be called via structured JSON. The LLM acts as a client that matches user intent to these operations and provides specific parameters.
Core Principle: LLM = Smart Matcher, System = Dumb Executor
- LLM Responsibilities: Understand natural language, match user intent against context, extract specific parameters, return structured API calls
- System Responsibilities: Validate inputs, execute operations reliably, handle errors, manage state, optimize performance
┌─────────────────┐
│ User Input │
│ (Natural Lang) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ LLM Client │ ← Smart Matcher
│ │
│ - Understands │
│ - Matches │
│ - Extracts │
│ - Returns API │
│ calls │
└────────┬────────┘
│
│ JSON API Calls
▼
┌─────────────────┐
│ API Server │ ← Dumb Executor
│ (System) │
│ │
│ - Validates │
│ - Executes │
│ - Returns │
└─────────────────┘
Description: List all running applications
Parameters: None
Example:
{
"type": "list_apps"
}Description: Bring an application to the front (launches if not running)
Parameters:
app_name(string, required): Exact application name from running/installed apps (non-empty string)
Type Definitions:
string: Non-empty string
Example:
{
"type": "focus_app",
"app_name": "Google Chrome"
}Description: Move an application window to a specific monitor or position
Parameters:
app_name(string, required): Exact application name (non-empty string)monitor(enum, optional): One of "main", "right", "left". Optional if bounds provided.bounds(array, optional): Exact window bounds[left, top, right, bottom]in absolute screen coordinates. AI calculates these based on monitor dimensions and user intent.
Type Definitions:
string: Non-empty stringenum: One of the specified valuesarray<integer>: Array of 4 integers representing [left, top, right, bottom] coordinates
Example (monitor-based placement):
{
"type": "place_app",
"app_name": "Google Chrome",
"monitor": "left"
}Example (bounds-based placement - left half):
{
"type": "place_app",
"app_name": "Google Chrome",
"monitor": "right",
"bounds": [1920, 0, 2880, 1080]
}Example (bounds-based placement - maximize):
{
"type": "place_app",
"app_name": "Google Chrome",
"bounds": [0, 0, 1920, 1080]
}Example (bounds-based placement - specific size, centered):
{
"type": "place_app",
"app_name": "Terminal",
"monitor": "left",
"bounds": [360, 140, 1560, 940]
}Note: The AI receives monitor context (dimensions) and calculates bounds based on user intent. For example:
- "left half" →
bounds: [monitor_x, monitor_y, monitor_x + monitor_w/2, monitor_y + monitor_h] - "right half" →
bounds: [monitor_x + monitor_w/2, monitor_y, monitor_x + monitor_w, monitor_y + monitor_h] - "1200x800" →
bounds: [monitor_x + (monitor_w-1200)/2, monitor_y + (monitor_h-800)/2, ...](centered) - "maximize" →
bounds: [monitor_x, monitor_y, monitor_x + monitor_w, monitor_y + monitor_h]
Description: Quit/close an application completely
Parameters:
app_name(string, required): Exact application name (non-empty string)
Type Definitions:
string: Non-empty string
Example:
{
"type": "close_app",
"app_name": "Google Chrome"
}Description: List all open Chrome tabs
Parameters: None
Example:
{
"type": "list_tabs"
}Description: Switch to a specific Chrome tab
Parameters:
tab_index(integer, required): Global tab index (1-based, across all windows, positive integer)
Type Definitions:
integer: Positive integer (1-based indexing)
Example:
{
"type": "switch_tab",
"tab_index": 3
}Note: LLM must match user intent ("reddit", "github", etc.) to specific tab index using available tab data.
Description: Open a URL in Chrome by creating a new tab
Parameters:
url(string, required): URL to open (non-empty string). The system will normalize the URL if needed (e.g., adds https:// if missing, handles common site names like "chatgpt" → "chatgpt.com")
Type Definitions:
string: Non-empty string
Example:
{
"type": "open_url",
"url": "https://chatgpt.com"
}Example (site name, will be normalized):
{
"type": "open_url",
"url": "chatgpt"
}Note:
- This command always creates a new tab. The AI should decide between
switch_tab(for existing tabs) andopen_url(for new tabs) based on user intent. - Use
switch_tabwhen user wants to go to an existing tab (e.g., "go to github", "switch to reddit tab"). - Use
open_urlwhen user explicitly wants to open a new tab (e.g., "open chatgpt in chrome", "open a new tab for github").
Description: Close one or more Chrome tabs
Parameters:
tab_indices(array, required): Array of global tab indices (1-based, across all windows). For single tab, use array with one element:[3]
Type Definitions:
array<integer>: Array of integers, non-empty, all values must be positive integers (1-based)
Example (single tab):
{
"type": "close_tab",
"tab_indices": [3]
}Example (bulk operation):
{
"type": "close_tab",
"tab_indices": [2, 5, 8]
}Note: LLM must match user intent ("all reddit tabs", "tabs 1, 3, and 5", etc.) to specific tab indices. System will optimize execution (close from highest to lowest to avoid index shifting).
Description: Activate a named preset window layout
Parameters:
preset_name(string, required): Exact preset name from available presets (non-empty string)
Type Definitions:
string: Non-empty string
Example:
{
"type": "activate_preset",
"preset_name": "code space"
}Description: Answer general questions about system state (tabs, apps, files, projects, history) using available context. Returns natural-language answers; does not execute any command.
Parameters:
question(string, required): The user's question
Example:
{
"type": "query",
"question": "What are my oldest tabs right now?"
}Notes:
- The system answers based on current context (running apps, installed apps, tabs, recent files, projects).
- Recent query Q&A are kept in memory for follow-up context within the current run.
All operations are called via a JSON object with:
commands(array): Array of operation objectsneeds_clarification(boolean): Whether clarification is neededclarification_reason(string, optional): Reason for clarification
Example (single operation):
{
"commands": [
{
"type": "switch_tab",
"tab_index": 3
}
],
"needs_clarification": false,
"clarification_reason": null
}Example (multiple operations):
{
"commands": [
{
"type": "place_app",
"app_name": "Google Chrome",
"monitor": "left"
},
{
"type": "place_app",
"app_name": "Cursor",
"monitor": "right"
}
],
"needs_clarification": false,
"clarification_reason": null
}Example (needs clarification):
{
"commands": [
{
"type": "switch_tab"
}
],
"needs_clarification": true,
"clarification_reason": "Could not find a tab matching 'xyz'. Did you mean one of these tabs?"
}The LLM receives the following context to make matching decisions:
- Running Applications: List of currently running apps
- Installed Applications: List of installed apps (for fuzzy matching)
- Chrome Tabs (Raw): Raw AppleScript output with all tab data
- Chrome Tabs (Parsed): Parsed tab data with:
index: Global tab indextitle: Tab titleurl: Full URLdomain: Extracted domaincontent_summary: Page content summaryis_active: Whether tab is activewindow_index: Window indexlocal_index: Local tab index within window
- Available Presets: List of available preset names
The LLM acts as an intelligent API client:
- Understand Natural Language: Parse user intent from natural language
- Match Context: Match user intent against available context (apps, tabs, presets)
- Extract Parameters: Extract specific parameters (app names, tab indices, etc.)
- Call API: Return structured API calls with specific parameters
- Handle Ambiguity: Detect when clarification is needed
Example Flow:
User: "switch to reddit"
↓
LLM:
- Parses "reddit"
- Matches against tab data
- Finds: Tab 3 has domain "reddit.com"
- Returns: {"type": "switch_tab", "tab_index": 3}
↓
System:
- Receives API call
- Validates tab_index: 3 exists
- Executes switch_to_chrome_tab(tab_index=3)
Example Flow (Bulk Operation):
User: "close all reddit tabs"
↓
LLM:
- Parses "all reddit tabs"
- Matches against tab data
- Finds: Tabs 2, 5, 8 have domain "reddit.com"
- Returns: {"type": "close_tab", "tab_indices": [2, 5, 8]}
↓
System:
- Receives API call
- Validates tab_indices exist
- Executes close_chrome_tabs_by_indices([2, 5, 8])
The system acts as a reliable API server:
- Validate Inputs: Validate all parameters before execution
- Execute Operations: Execute operations reliably
- Handle Errors: Provide clear error messages
- Manage State: Handle state changes (tab indices shifting, etc.)
- Optimize Performance: Optimize execution (bulk operations, etc.)
- LLM = Smart Matcher: LLM matches user intent to specific identifiers
- System = Dumb Executor: System executes operations with specific parameters
- No Filters: LLM should return specific indices/names, not filters
- Clear Contract: API contract is well-defined and documented
- Separation of Concerns: LLM handles matching, system handles execution
If LLM can't find a match:
{
"commands": [{"type": "switch_tab"}],
"needs_clarification": true,
"clarification_reason": "Could not find a tab matching 'xyz'. Did you mean one of these tabs?"
}If system receives invalid parameters:
- System validates before execution
- Returns error if validation fails
- Provides clear error message to user
- Clear Contract: Well-defined operations and parameters
- Separation of Concerns: LLM matches, system executes
- Testability: Can test API operations independently
- Extensibility: Easy to add new operations
- Reliability: System execution is deterministic
- Debugging: Clear boundaries for debugging
This architecture follows the Model Context Protocol (MCP) pattern:
- Tools/Functions: System operations (like MCP tools)
- Tool Descriptions: API contract documentation
- Tool Calls: LLM returns structured calls
- Tool Execution: System executes tools
- Tool Results: System returns results
The system acts as an MCP server, exposing tools that the LLM can call with specific parameters.