GitHub - Aditya060806/WebScope: Stop sending screenshots to vision models. WebScope renders websites into structured text grids so LLMs can understand layout, content, and interactions natively.

Give your AI agent eyes - without the vision model.

WebScope turns any web page into a lightweight, structured text grid that LLMs can read, understand, and interact with — all without screenshots, vision APIs, or pixel parsing.

Full JavaScript execution. Spatial layout preserved. Every interactive element annotated and clickable by reference.

Docs · npm · GitHub

What's New in v1.0.1

Feature	Description
Custom Headers & Auth	Pass `Authorization`, cookies, or any custom headers with every request
Scoped API Security	Enforce `read`, `write`, and `admin` scopes with bearer key auth on HTTP endpoints
Admin Key Management API	Create, list, and revoke keys via `/auth/keys` with masked token previews
Persistent API Key Store	Store keys in `memory`, `file`, or `redis` backends for single-node or shared deployments
Distributed Rate Limiting	Use Redis-backed rate limits across instances with `X-RateLimit-*` response headers
Device Emulation	Render as iPhone, Pixel, iPad — 9 built-in profiles via `--device` flag
JavaScript Evaluation	Run arbitrary JS in the page with `webscope_evaluate`
Batch Operations	Chain multiple actions in a single call with `webscope_batch`
Change Detection	Diff snapshots to see what elements appeared, disappeared, or changed
Semantic Search	Find elements by natural language: "login button", "email input"
Proxy Support	Route through HTTP/SOCKS proxies via `--proxy` or `WEBSCOPE_PROXY`
Session Recording	Record, export, and replay action sequences
Network Inspector	Capture all HTTP requests/responses for debugging
Async Python Tools	Production-ready async LangChain and CrewAI integrations with `httpx`
Production Docker Runtime	Multi-stage Docker build, healthcheck, non-root runtime user, and Redis compose profile
OpenAPI Spec	Full OpenAPI 3.1 spec at `/openapi.json`
Prometheus Metrics	`/metrics` endpoint for monitoring

What WebScope Can Do Now

Render modern JS-heavy pages into compact, structured text grids suitable for LLM reasoning.
Execute full interaction loops (navigate, click, type, scroll, waitFor, evaluate, batch, replay).
Run secure multi-key API access with scopes, admin key lifecycle endpoints, and per-key limits.
Persist auth and traffic controls with file or Redis stores for restart-safe and multi-instance operation.
Deploy in containers with Compose and optional Redis-backed distributed operation out of the box.

Release Notes

Latest release: webscope@1.0.1
Changelog: CHANGELOG.md
Release tag notes template: .github/RELEASE_TEMPLATE.md
Upgrade in one command:

npm install -g webscope@latest

The Problem

Every existing approach to giving LLMs web access has a tradeoff that hurts:

Approach	Payload Size	External Dependency	Latency	Layout Fidelity	Token Cost
Screenshot + Vision	~1 MB	Vision model	High	Pixel-level	~1,000+
Accessibility Tree	~5 KB	None	Low	❌ Lost	~50–200
Raw HTML	~100 KB+	None	Low	❌ Lost	~2,000+
WebScope	~2–5 KB	None	Low	✅ Preserved	~50–150

Screenshots are bulky and need expensive vision models to interpret. Accessibility trees and raw HTML are fast but throw away where things are on the page — layout, proximity, visual grouping. WebScope keeps the spatial structure intact, in a format that's native to how LLMs already think: text.

Get Started

npm install -g webscope

Chromium downloads automatically on install. If it doesn't (corporate proxy, CI, etc.), run it manually:

webscope install

You're ready. Try it out:

# Render any page as a text grid
webscope https://news.ycombinator.com

# Drop into interactive mode — click, type, scroll in real time
webscope --interactive https://github.com

# Pipe structured JSON directly to your agent
webscope --json https://example.com

Docker Quick Start

Run WebScope in a container with Playwright preconfigured:

docker compose up --build -d

Run with Redis profile (distributed rate limits + Redis key store):

docker compose -f docker-compose.yml -f docker-compose.redis.yml --profile redis up --build -d
# or
npm run docker:up:redis

Check health:

curl http://localhost:3000/health

Call the API:

curl -X POST http://localhost:3000/navigate \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com"}'

Stop everything:

docker compose down

Optional env configuration:

cp .env.docker.example .env
# then edit .env for API keys, store backends, CORS origin, proxy, timeout

Use Redis-backed stores (distributed rate limit + shared keys):

docker compose -f docker-compose.yml -f docker-compose.redis.yml --profile redis up --build -d

Or use npm helpers:

npm run docker:build
npm run docker:run

What Your Agent Sees

[0]Hacker News [1]new | [2]past | [3]comments | [4]ask | [5]show | [6]jobs | [7]submit      [8]login

 1. [9]Show HN: WebScope – text-grid browser for AI agents (github.com)
    142 points by adityapandey 3 hours ago | [10]89 comments
 2. [11]Why LLMs don't need screenshots to browse the web
    87 points by somebody 5 hours ago | [12]34 comments

[13:______________________] [14 Search]

That's roughly 500 bytes. Your LLM reads this, understands the layout, and says "click ref 9" to open the first link. No vision model. No base64 images. Just text.

Integrations

WebScope slots into whatever stack you're already using.

MCP Server — Claude Desktop, Cursor, Windsurf, Cline

The zero-config path. Install once, and any MCP-compatible client gets full web browsing.

npm install -g webscope
# or run directly:
npx webscope-mcp

Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "webscope": {
      "command": "webscope-mcp"
    }
  }
}

Cursor — add to .cursor/mcp.json:

{
  "mcpServers": {
    "webscope": {
      "command": "webscope-mcp"
    }
  }
}

Now just ask your agent: "Go to Hacker News and summarize the top posts about AI." It handles the rest.

What the MCP server gives you:

session_id on every tool call — run isolated parallel workflows without stepping on each other
webscope_storage_save / webscope_storage_load — persist cookies, localStorage, and session state across runs
webscope_wait_for — pause until a selector appears, text loads, or a URL changes (essential for SPAs)
webscope_assert_field — guard your multi-step flows: verify field values before clicking submit
webscope_evaluate — run JavaScript in the page for advanced extraction or manipulation
webscope_batch — chain multiple actions in a single call for efficiency
webscope_diff — see what changed between snapshots (elements added, removed, modified)
webscope_find — semantic search: find elements by description ("login button", "email input")
webscope_network — inspect all HTTP requests/responses made by the page
webscope_record_start/stop/export + webscope_replay — record and replay action sequences
webscope_devices — list available device profiles for mobile/tablet emulation
Custom headers — pass headers to webscope_navigate for auth tokens, cookies, etc.
Device emulation — pass device: "iphone14" to render as mobile

OpenAI / Anthropic Function Calling

Ready-made tool definitions you can plug directly into any function-calling model. See tools/tool_definitions.json.

Pair it with the system prompt so the model knows how to read and navigate the grid:

import json

with open("tools/tool_definitions.json") as f:
    webscope_tools = json.load(f)["tools"]

with open("tools/system_prompt.md") as f:
    system_prompt = f.read()

response = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Go to example.com and click the first link"},
    ],
    tools=webscope_tools,
)

LangChain

from tools.langchain import get_webscope_tools

# Start the server first: webscope --serve 3000
tools = get_webscope_tools(base_url="http://localhost:3000")

from langchain.agents import initialize_agent
agent = initialize_agent(tools, llm, agent="zero-shot-react-description")
agent.run("Find the top story on Hacker News")

Async version (recommended for production):

from tools.langchain_async import get_webscope_tools_async

tools = get_webscope_tools_async(base_url="http://localhost:3000")
# Works with async agents, includes evaluate, find, and header support

CrewAI

from tools.crewai import WebScopeBrowseTool, WebScopeClickTool, WebScopeTypeTool

# Start the server first: webscope --serve 3000
researcher = Agent(
    role="Web Researcher",
    tools=[WebScopeBrowseTool(), WebScopeClickTool(), WebScopeTypeTool()],
    llm=llm,
)

Async version:

from tools.crewai_async import AsyncWebScopeBrowseTool, AsyncWebScopeClickTool
# Includes evaluate, find, and device emulation support

HTTP API

Spin up the REST server and call it from anything — Python, curl, your own orchestrator.

webscope --serve 3000

# Navigate to a page
curl -X POST http://localhost:3000/navigate \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://example.com"}'

# Navigate with auth headers and device emulation
curl -X POST http://localhost:3000/navigate \
  -d '{"url": "https://example.com", "headers": {"Authorization": "Bearer token"}, "device": "iphone14"}'

# Interact
curl -X POST http://localhost:3000/click -d '{"ref": 3}'
curl -X POST http://localhost:3000/type -d '{"ref": 7, "text": "hello"}'
curl -X POST http://localhost:3000/scroll -d '{"direction": "down"}'
curl -X POST http://localhost:3000/press -d '{"key": "Enter"}'
curl -X POST http://localhost:3000/waitFor -d '{"selector": ".results"}'
curl -X POST http://localhost:3000/assertField -d '{"ref": 7, "expected": "hello"}'

# New in v1.0.0
curl -X POST http://localhost:3000/evaluate -d '{"script": "document.title"}'
curl -X POST http://localhost:3000/batch -d '{"actions": [{"action": "click", "params": {"ref": 3}}]}'
curl -X POST http://localhost:3000/find -d '{"query": "submit button"}'
curl -X POST http://localhost:3000/headers -d '{"headers": {"X-Custom": "value"}}'
curl http://localhost:3000/diff
curl http://localhost:3000/devices
curl http://localhost:3000/network
curl http://localhost:3000/metrics
curl http://localhost:3000/openapi.json

# Recording
curl -X POST http://localhost:3000/record/start
curl -X POST http://localhost:3000/record/stop
curl http://localhost:3000/record/export
curl -X POST http://localhost:3000/replay

# Admin key management (requires admin scope)
curl -H 'Authorization: Bearer admin-token' http://localhost:3000/auth/keys
curl -X POST http://localhost:3000/auth/keys \
  -H 'Authorization: Bearer admin-token' \
  -d '{"token":"reader-token","id":"reader","scopes":["read"],"rate_limit":60}'
curl -X DELETE http://localhost:3000/auth/keys \
  -H 'Authorization: Bearer admin-token' \
  -d '{"id":"reader"}'

# State management
curl -X POST http://localhost:3000/saveState -d '{"path": "/tmp/state.json"}'
curl -X POST http://localhost:3000/loadState -d '{"path": "/tmp/state.json"}'

Security: Set WEBSCOPE_API_KEY to require Authorization: Bearer <key> on all requests. Set WEBSCOPE_CORS_ORIGIN to lock down cross-origin access.

Advanced auth and traffic controls:

WEBSCOPE_API_KEYS_JSON enables scoped keys and per-key limits.
WEBSCOPE_API_KEY_STORE=file persists keys to disk (WEBSCOPE_API_KEYS_FILE) so they survive restarts.
WEBSCOPE_API_KEY_STORE=redis stores keys in Redis for shared multi-instance setups.
File/Redis key stores are reloaded per request so key updates propagate across running instances.
WEBSCOPE_RATE_LIMIT_WINDOW_MS + WEBSCOPE_RATE_LIMIT_MAX enforce request budgets.
WEBSCOPE_RATE_LIMIT_STORE=redis enables distributed rate limiting across instances.
WEBSCOPE_AUDIT_LOG=true emits structured JSON audit logs per request.

Example scoped keys:

export WEBSCOPE_API_KEYS_JSON='{"admin-token":{"id":"admin","scopes":["admin","read","write"],"rate_limit":300},"reader-token":{"id":"reader","scopes":["read"],"rate_limit":60}}'

Node.js Library

Use it directly in your own code — no server required.

const { AgentBrowser } = require('webscope');

const browser = new AgentBrowser({ cols: 120 });
const { view, elements, meta } = await browser.navigate('https://example.com');

console.log(view);        // The text grid
console.log(elements);    // { 0: { selector, tag, text, href }, ... }
console.log(meta.stats);  // { totalElements, interactiveElements, renderMs }

await browser.click(3);              // Click element [3]
await browser.type(7, 'hello');      // Type into element [7]
await browser.scroll('down');        // Scroll down
await browser.press('Enter');        // Press a key
await browser.waitFor({ selector: '.step-2.active' });
await browser.assertField(7, 'hello', { comparator: 'equals' });
await browser.saveStorageState('/tmp/webscope-state.json');
await browser.loadStorageState('/tmp/webscope-state.json');
await browser.query('nav a');        // CSS selector search
await browser.screenshot();          // PNG buffer (debugging)
console.log(browser.getCurrentUrl());

// v1.0.0 features
await browser.evaluate('document.title');                // Run JS in page
await browser.batch([                                    // Multi-step batch
  { action: 'type', params: { ref: 3, text: 'user@example.com' } },
  { action: 'click', params: { ref: 7 } },
]);
browser.find('submit button');                           // Semantic search
browser.diff();                                          // Change detection
browser.setHeaders({ 'Authorization': 'Bearer token' });// Session headers
browser.startRecording();                                // Record actions
browser.getNetworkLog();                                 // Network capture

await browser.close();

Configuration

Everything can be configured via CLI flags or environment variables. CLI flags always take priority.

Flag	Environment Variable	Default	Type	Description
`--port, -p`	`WEBSCOPE_PORT`	`3000`	`int`	HTTP server port
`--cols, -c`	`WEBSCOPE_COLS`	`100`	`int`	Grid width in characters
`--timeout, -t`	`WEBSCOPE_TIMEOUT`	`30000`	`int`	Navigation timeout in milliseconds
—	`WEBSCOPE_NETWORK_LOG_LIMIT`	`2000`	`int`	Max network log entries retained per session
—	`WEBSCOPE_MAX_SESSIONS`	`20`	`int`	Max concurrent MCP sessions before LRU eviction
—	`WEBSCOPE_SESSION_TTL_MS`	`1800000`	`int`	MCP idle session TTL in milliseconds
—	`WEBSCOPE_RATE_LIMIT_WINDOW_MS`	`60000`	`int`	Rate-limit window size in milliseconds
—	`WEBSCOPE_RATE_LIMIT_MAX`	`120`	`int`	Max requests per window (default limit per identity)
—	`WEBSCOPE_RATE_LIMIT_STORE`	`memory`	`string`	Rate-limit backend (`memory`, `redis`)
—	`WEBSCOPE_REDIS_URL`	—	`string`	Redis URL used by Redis-backed rate limits and key store
—	`WEBSCOPE_API_KEY_STORE`	`memory`	`string`	API key backend (`memory`, `file`, `redis`)
—	`WEBSCOPE_API_KEYS_FILE`	`./state/api-keys.json`	`string`	File path for key persistence when using `file` backend
—	`WEBSCOPE_AUDIT_LOG`	`true`	`bool`	Emit structured request audit logs to stdout
`--device, -d`	—	—	`string`	Device profile (iphone14, pixel7, ipadpro, etc.)
`--proxy`	`WEBSCOPE_PROXY`	—	`string`	HTTP/SOCKS proxy URL
`--record`	—	`false`	`bool`	Record actions in interactive mode
—	`WEBSCOPE_API_KEY`	—	`string`	API key required on all HTTP requests
—	`WEBSCOPE_API_KEYS_JSON`	—	`json`	Scoped API key map (`id`, `scopes`, `rate_limit`)
—	`WEBSCOPE_CORS_ORIGIN`	`*`	`string`	Allowed CORS origin

Grid Conventions

Each element type has a consistent visual representation in the text grid:

Element	Grid Notation	Agent Action
Link	`[ref]link text`	`click(ref)`
Button	`[ref button text]`	`click(ref)`
Text input	`[ref:placeholder____]`	`type(ref, "text")`
Checkbox	`[ref:X]` / `[ref: ]`	`click(ref)`
Radio button	`[ref:●]` / `[ref:○]`	`click(ref)`
Dropdown	`[ref:▼ Selected]`	`select(ref, "value")`
File input	`[ref: Choose file]`	`upload(ref, "/path")`
Heading	`═══ HEADING ═══`	Read-only
Separator	`────────────────`	Read-only
List item	`• Item text`	Read-only

Under the Hood

┌─────────────────────────────────────────────┐
│  Your Agent (any LLM)                        │
│  "click 3" / "type 7 hello" / "scroll down"  │
├─────────────────────────────────────────────┤
│  WebScope                                     │
│  Pixel positions → character grid            │
│  Interactive elements get [ref] annotations  │
├─────────────────────────────────────────────┤
│  Headless Chromium (Playwright)              │
│  Full JS/CSS execution                       │
│  getBoundingClientRect() for all elements    │
└─────────────────────────────────────────────┘

The pipeline is straightforward:

Render — A real Chromium instance loads the page with full JS/CSS execution
Extract — Every visible element's position, size, text, and interactivity is captured
Map — Pixel coordinates are converted to character grid positions, preserving spatial layout
Annotate — Interactive elements get [ref] numbers so agents can act on them

Selector Strategy

Selectors need to survive between snapshots — if the DOM shifts slightly, your agent shouldn't lose track of the submit button. WebScope builds resilient CSS selectors with this priority:

Priority	Strategy	Example	Stability
1	`#id`	`#email`	Highest
2	`[data-testid]`	`[data-testid="submit-btn"]`	High
3	`[aria-label]`	`input[aria-label="Search"]`	High
4	`[role]`	`[role="navigation"]`	Medium
5	`[name]`	`input[name="email"]`	Medium
6	`a[href]`	`a[href="/about"]`	Medium
7	`nth-child`	`div > a:nth-child(3)`	Low

This stability is what makes multi-step workflows reliable — your agent can fill a form across several page transitions without selectors breaking between steps.

Real-World Example: ATS Job Application

Multi-step application flows (Greenhouse, Lever, etc.) are where WebScope really shines. Here's how you'd automate one:

// Open the job posting — keep a stable session throughout
await webscope_navigate({ url: 'https://job-boards.greenhouse.io/acme/jobs/123', session_id: 'apply-acme' });

// Fill out the form
await webscope_type({ ref: 12, text: 'Aditya', session_id: 'apply-acme' });
await webscope_type({ ref: 15, text: 'Pandey', session_id: 'apply-acme' });
await webscope_click({ ref: 42, session_id: 'apply-acme', retries: 3, retry_delay_ms: 400 });

// Wait for the next step to load before continuing
await webscope_wait_for({ selector: '#step-2.active', timeout_ms: 8000, session_id: 'apply-acme', retries: 2 });

// Double-check a field value before submitting
await webscope_assert_field({ ref: 77, expected: 'San Francisco', comparator: 'includes', session_id: 'apply-acme' });

// Save the session so you can resume later
await webscope_storage_save({ path: '/tmp/ats-state.json', session_id: 'apply-acme' });

Handy session management:

webscope_session_list — see all active sessions
webscope_session_close — tear down one or all sessions

Error Handling

All HTTP errors return a structured JSON response with a machine-readable code:

{ "error": "URL scheme \"file:\" is not allowed", "code": "INVALID_URL_SCHEME" }

Code	HTTP Status	Description
`MISSING_PARAM`	`400`	Required field missing from the request body
`INVALID_URL`	`400`	URL could not be parsed
`INVALID_URL_SCHEME`	`400`	Blocked scheme (`file:`, `javascript:`, `data:`)
`INVALID_JSON`	`400`	Request body is not valid JSON
`BROWSER_NOT_READY`	`400`	No page loaded — call `/navigate` first
`FORBIDDEN_SCOPE`	`403`	API key is valid but missing required scope (`read` or `write`)
`BODY_TOO_LARGE`	`413`	Request body exceeds 1 MB
`RATE_LIMITED`	`429`	Request budget exceeded for the current rate-limit window
`UNAUTHORIZED`	`401`	Missing or invalid API key
`KEY_STORE_ERROR`	`500`	API key backend could not be loaded (file/redis issue)
`RATE_LIMIT_STORE_ERROR`	`500`	Rate-limit backend is unavailable
`NOT_FOUND`	`404`	Unknown endpoint
`METHOD_NOT_ALLOWED`	`405`	Incorrect HTTP method for this endpoint
`INTERNAL_ERROR`	`500`	Unexpected server error

Testing

# Run all tests
npm test

# Auth scopes + file key persistence + per-key rate limits
npm run test:form

# Backend outage paths (`KEY_STORE_ERROR`, `RATE_LIMIT_STORE_ERROR`)
npm run test:live

# Redis distributed integration (shared keys + cross-instance limits)
npm run test:ats

npm run test:ats requires a reachable Redis instance (redis://127.0.0.1:6379/15 by default) and is skipped automatically when Redis is unavailable.

Design Philosophy

Text is native to LLMs — no vision model middleman, no base64 encoding, no token-heavy image payloads
Spatial layout matters — a flat list of elements loses the where; WebScope preserves it
Cheap and fast — 2–5 KB per render vs. 1 MB+ screenshots
Full web support — real Chromium runs the JavaScript; SPAs, dynamic content, and auth flows all work
Interactive by design — numbered references map directly to real DOM elements; click, type, scroll

Author

Aditya Pandey

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
canvas		canvas
docs		docs
mcp		mcp
public		public
site		site
src		src
test		test
tools		tools
.dockerignore		.dockerignore
.env.docker.example		.env.docker.example
.gitignore		.gitignore
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.redis.yml		docker-compose.redis.yml
docker-compose.yml		docker-compose.yml
logo.svg		logo.svg
openapi.yaml		openapi.yaml
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Give your AI agent eyes - without the vision model.

What's New in v1.0.1

What WebScope Can Do Now

Release Notes

The Problem

Get Started

Docker Quick Start

What Your Agent Sees

Integrations

MCP Server — Claude Desktop, Cursor, Windsurf, Cline

OpenAI / Anthropic Function Calling

LangChain

CrewAI

HTTP API

Node.js Library

Configuration

Grid Conventions

Under the Hood

Selector Strategy

Real-World Example: ATS Job Application

Error Handling

Testing

Design Philosophy

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Give your AI agent eyes - without the vision model.

What's New in v1.0.1

What WebScope Can Do Now

Release Notes

The Problem

Get Started

Docker Quick Start

What Your Agent Sees

Integrations

MCP Server — Claude Desktop, Cursor, Windsurf, Cline

OpenAI / Anthropic Function Calling

LangChain

CrewAI

HTTP API

Node.js Library

Configuration

Grid Conventions

Under the Hood

Selector Strategy

Real-World Example: ATS Job Application

Error Handling

Testing

Design Philosophy

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages