Skip to content

A browser-integrated AI automation framework that lets you control and operate web apps through natural-language commands. Built with Tampermonkey + a Node.js proxy using OpenAI API. It observes the DOM, plans actions, executes them safely in-browser, and no external data leaves your environment.

License

Notifications You must be signed in to change notification settings

Edge-J/Jroid-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Jroid

📘 Project Description

Jroid is a browser-integrated automation framework that allows you to operate any web application through natural-language commands.
It combines a Tampermonkey client script (for DOM control and UI overlays) with a local Node.js proxy server (for planning, reasoning, and logging) powered by OpenAI’s API.

The system observes the browser DOM, plans actions using ReAct-style reasoning, executes them safely, learns from previous runs through a Reflexion loop, and logs everything locally — ensuring complete privacy and transparency.


🧱 System Overview

Core Architecture

Layer Component Description
1. Agent Runtime OpenAI API API Performs reasoning, planning, and reflection using ReAct + Plan-Execute + Reflexion frameworks.
2. Communication Layer WebSocket (primary), HTTP (fallback) Bi-directional channel between Tampermonkey and the local proxy.
3. Data Format JSON Standardized schemas for context, plan, and feedback.
4. State Management JSONL files Append-only local logs for steps, memory, and reflections.

MCP Toolchain

MCP Server Tool Names Responsibility
messagingServer composeMessage, sendEmail, replyEmail, writeLinkedInPost Drafts outreach, validates email content, and prepares social posts.
analysisServer summarize, aggregateCompose, searchWeb, formAutofill, extractTable Research + data extraction tasks; converts DOM text into structured summaries or table instructions.
notesServer saveToNotes, describeControls, describeForm Persists notes and provides quick descriptions of visible controls/forms.
browserServer navigate Validates navigation requests and emits safe navigation plans.
storageServer logEvent, readLog, recordMemory, queryMemory Centralized logging + memory persistence. All proxy-side logging (plans, execution steps, reflections) now flows through this MCP server so every record shares a single schema.

Each server is described in config/mcp.config.json and auto-routed by server/mcp/serverManager.js, making it easy to add future domains (CRM, research APIs, etc.) without touching the planner.


⚙️ End-to-End System Integration Flow

Phase 0 – Environment Setup

  1. Set up the local Node.js proxy server with WebSocket + HTTP endpoints.
  2. Install Tampermonkey and load the client script into your browser.
  3. Confirm that the browser connects to the proxy on startup (Agent ready / WS connected).

Phase 1 – User Interaction

  1. User opens any page and presses Ctrl + . or clicks the 🦾 icon.
  2. A floating chat-style input bar appears (top-right corner).
  3. User types a natural-language command (e.g. "Reply to the most recent email I received" or “Search LinkedIn for AI jobs in San Francisco”).
  4. Tampermonkey collects the DOM context and sends a JSON payload to the proxy.

Phase 2 – Agent Intelligence & Planning

  1. Proxy formats the input for GPT using the ReAct + Plan-Execute + Reflexion prompt.
  2. LLM performs step-by-step reasoning and returns a structured JSON action plan.
  3. Proxy validates schema and sends the plan back to Tampermonkey.

Phase 3 – Sandbox Confirmation

  1. Tampermonkey displays a popup overlay showing:
    • Planned steps
    • Preview of any generated text (e.g. email drafts)
  2. User chooses:
    • Confirm → Execute actions live.
    • Decline → Abort task and log status.

Phase 4 – Execution Engine

  1. Tampermonkey runs each action sequentially:
    • Deterministic selector → Fuzzy fallback if needed.
    • Clicks, typing, waiting, extracting.
  2. Random delays (300–800 ms) simulate human pacing.
  3. Step results stream via WebSocket to the proxy and console.
  4. Proxy writes JSONL entries for every action in /data/logs/.

Phase 5 – Reflexion & Memory

  1. When a task finishes, the proxy sends the full log back to GPT for reflection.
  2. The model summarizes successes/failures and outputs a short “lesson.”
  3. Lesson appended to /data/memory/reflections.jsonl.
  4. On next run, the last few lessons are injected into the system prompt to guide improved planning.

Phase 6 – Safety, Rate-Limiting, and Recovery

Category Rule
Action Approval All write/send/delete actions require popup confirmation.
Rate Limiting 300–800 ms random delay per action, 2–5 s between task groups.
Privacy No data leaves local environment unless explicitly authorized.
Recovery Automatic retry → fallback to fuzzy selectors → replan → safe abort on repeated failures.
Manual Stop Global “Stop Task” button immediately halts execution.

Phase 7 – Reporting & Closure

  • Proxy compiles run summary (start/end time, success %, error count).
  • Toast message shown:

    ✅ Task Complete – 17 steps executed, 0 errors

  • Logs and reflections remain locally stored for review or future learning.

🧩 Subsystem Responsibilities

Subsystem Responsibility
Tampermonkey Client DOM observation, UI overlays, confirmation popup, and live execution.
Node.js Proxy Server OpenAI API communication, plan validation, memory & log storage (via MCP), error recovery.
LLM (GPT-4.1) Natural-language understanding, reasoning, planning, and reflexive learning.

Memory & Logging Flow

  • Browser snapshots stream to the proxy → recordPageSnapshotstorageServer.recordMemory, ensuring every snapshot and graph update is persisted with the same schema.
  • All plan + execution logs are appended through storageServer.logEvent, so JSONL files under data/logs/ always contain a timestamp, sessionId, type, and tool-specific metadata.
  • Reflections, notes, and cached values are exposed back to Tampermonkey so placeholder resolution ({{DRAFT_EMAIL_BODY}}, etc.) never stalls execution.

🔒 Safety & Ethical Guardrails

  • Human-in-the-loop: All impactful actions require explicit approval.
  • Privacy-first: No credentials or sensitive data ever leave the user’s machine.
  • Rate-limiting: Mimics natural behavior to avoid detection or system strain.
  • Auditability: Every run is fully logged in JSONL for transparency and debugging.

🧭 Development Roadmap

Milestone Goal
MVP (v1) Single-tab automation loop with ReAct reasoning and confirmation popup.
v2 Multi-tab orchestration using BroadcastChannel Tab Manager.
v3 Local dashboard for viewing logs, lessons, and task metrics.
v4 Transition optional LLM runtime to local (Ollama or Hugging Face) for full offline mode.

🧠 Copilot Guidance

This README provides the full system specification for GitHub Copilot and AI assistants.
Use it as context for generating functions, data models, prompts, and integration logic while maintaining:

  • Layered modularity: keep Tampermonkey, proxy, and memory independent.
  • Schema integrity: preserve JSON formats for context, plan, and feedback.
  • Safety compliance: always include confirmation, rate limits, and error recovery hooks.

🪪 License

MIT License — open for modification and learning use.


🧪 Testing & Developer Workflow

  1. Install depsnpm install
  2. Run the proxynpm run dev
  3. Execute the automated suitenpm test
    • tests/planValidation.test.js ensures the planner schema validator rejects malformed steps.
    • tests/mcpIntegration.test.js spins up a sample JSON-RPC server to verify the shared JSONRPCClient contract.
  4. Manual loop – load the Tampermonkey script, trigger a command (Ctrl + .), approve the generated plan, and watch the HUD stream tool results + cached placeholders.

Add new tests under tests/*.test.js and keep them self-contained; the suite runs with node --test so no extra harness is required.

About

A browser-integrated AI automation framework that lets you control and operate web apps through natural-language commands. Built with Tampermonkey + a Node.js proxy using OpenAI API. It observes the DOM, plans actions, executes them safely in-browser, and no external data leaves your environment.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published