GitHub - saifyxpro/HeadlessX: The undetected self-hosted browser automation platform. Powered by Camoufox (Firefox) for 0% detection rates. Built for speed, privacy, and scalability.

Self-hosted operators for website extraction, search, and agent workflows powered by Headfox JS and Camoufox

Overview

HeadlessX is a self-hosted scraping platform with a web dashboard, protected API, queue-backed workflows, and a remote MCP endpoint.

Current live operator surfaces:

Website operator: scrape, crawl, map, content extraction, screenshots
Google AI Search
Tavily
Exa
YouTube
Queue jobs, logs, API keys, proxy management, and config management
Remote MCP over /mcp

Important operator setup notes:

Google AI Search requires a one-time Build Cookies run in the dashboard before the first search
the saved Google session is kept in the shared persistent browser profile and reused later
the YouTube workspace is active only when YT_ENGINE_URL points at a healthy yt-engine service

What Changed In v2.1.2

Added the published HeadlessX CLI bootstrap flow with headlessx init, start, logs, stop, restart, status, and doctor
Upgraded the CLI prompt UX with guided modern setup and login prompts
Added Docker plus Caddy production domain scaffolding under infra/domain-setup
Moved local and Docker host defaults to rarer ports to avoid conflicts with common 3000 and 8000 stacks
Refreshed setup, CLI, and self-hosting docs around the current operator-first platform layout

Operators

Coming Soon

Operator	Description	Status
Google Maps	Extract business listings, reviews, categories, ratings, contact details, opening hours, and location metadata from Google Maps search results.	Planned
Twitter / X	Capture profiles, posts, engagement metrics, media, hashtags, and conversation threads from public X pages.	Planned
LinkedIn	Extract public company and profile data, role details, locations, website links, and business metadata from LinkedIn surfaces.	Planned
Instagram	Collect public profile data, captions, post metadata, media links, reels references, and engagement signals.	Planned
Amazon	Extract product listings, seller data, pricing, ratings, reviews, availability, and catalog metadata from Amazon pages.	Planned
Facebook	Capture public page data, posts, about fields, links, follower counts, and engagement metadata from Facebook pages.	Planned
Reddit	Extract subreddit, post, comment, author, score, flair, and discussion metadata from Reddit threads and listings.	Planned
ThomasNet Suppliers Real-Time Scraper	Extract 70+ ThomasNet supplier fields including emails, phone numbers, company data, products, locations, certifications, and more.	Planned
TLS Appointment Booker	Automate TLS appointment availability checks and booking workflows with support for high-frequency monitoring and retry-safe session handling.	Planned
GlobalSpec Suppliers Scraper	Extract 200,000+ industrial supplier profiles from GlobalSpec Engineering360 with contact data, business type, product catalogs, specs, and datasheets.	Planned
ImportYeti Scraper	Extract supplier profiles, shipment records, and trade data from ImportYeti with 60+ fields including HS codes, shipping lanes, carriers, bills of lading, trading partners, and contact info.	Planned
MakersRow Scraper	Extract 11,600+ US manufacturer profiles from MakersRow with email, phone, address, website, GPS coordinates, capabilities, ratings, gallery images, and business hours.	Planned

Agent Surfaces Coming Soon

Surface	Description	Status
Web AI Agent (`/web`)	Interactive AI agent workspace inside the dashboard that can use all HeadlessX operators and related workflow actions, including Website, Google AI Search, Tavily, Exa, and YouTube.	Planned

Agent Skills

You can add the HeadlessX CLI skill to AI coding agents such as Cursor, Claude Code, Warp, Windsurf, OpenCode, OpenClaw, Antigravity, and similar tools that support the skills installer flow.

npx skills add https://github.com/saifyxpro/HeadlessX --skill cli

This installs the HeadlessX CLI skill from this repository so the agent can use the published headlessx command and follow the packaged usage guidance.

UI Screenshots

Google AI Search (Recently Tested with Arabic Lang & Region)

Website

Proof

BrowserScan

Cloudflare Challenge

Pixelscan

Proxy Validation

Quick Start

System Requirements

Item	Minimum	Recommended
OS	macOS, Linux, or Windows 11 with WSL2	Ubuntu 22.04+/24.04, Debian 12, or Windows 11 with WSL2
CPU	2 cores	4+ cores
RAM	4 GB	8-16 GB
Disk	10 GB free	20+ GB SSD
Network	outbound internet for installs, browser downloads, and APIs	stable broadband

Runtime Dependencies

Node.js 22+
pnpm 10.32.1+
Git
Docker + Compose v2 for self-host or production mode
PostgreSQL
Redis
Python/uv for yt-engine
Go for the HTML-to-Markdown sidecar

If your machine does not already use the pinned pnpm release, align it with:

corepack enable
corepack use pnpm@10.32.1

Practical Sizing Notes

4 GB RAM is enough for light local testing
8 GB RAM is the better baseline for the web, API, worker, Redis, and browser runtime together
16 GB RAM is safer for heavier crawl jobs, YouTube flows, or multiple concurrent browser tasks

CLI Bootstrap

HeadlessX is now CLI-first for installation and local lifecycle management.

npm install -g @headlessx-cli/core
headlessx init
headlessx status
headlessx doctor

The CLI bootstraps HeadlessX into ~/.headlessx by default and supports three setup modes:

developer: clone the repo, keep app services local, and use Docker only where needed for infrastructure
self-host: run the full HeadlessX stack on rare localhost ports with Docker
production: run the Docker app stack plus the Caddy/domain layer for dashboard.yourdomain.com and api.yourdomain.com

Useful examples:

headlessx init --mode developer
headlessx init --mode self-host
headlessx init --mode production --api-domain api.example.com --web-domain dashboard.example.com --caddy-email ops@example.com
headlessx init update
headlessx init update --branch develop
headlessx start
headlessx logs
headlessx restart
headlessx stop

For existing VPS or Docker installs, use headlessx init update to pull the latest repo state into ~/.headlessx/repo, reconcile missing env keys for the saved mode, then run headlessx restart. For self-host and production, headlessx restart rebuilds Docker images before bringing the stack back up.

HeadlessX intentionally uses uncommon localhost defaults to avoid conflicts with other tools: web=34872, api=38473, postgres=35432, redis=36379, html-to-md=38081, yt-engine=38090.

For deeper setup details, direct repo development, env files, Docker internals, and MCP/client notes, see docs/setup-guide.md.

Google AI Search First Run

The first Google AI Search run now uses a shared persistent browser profile instead of a seeded browser profile committed into the repo.

Open /playground/operators/google/ai-search
Click Build Cookies
Let the shared browser open Google
Browse normally and solve any Google or reCAPTCHA prompt once
Click Stop Browser to save the profile

After that, the saved shared profile is reused for later Google searches.

Docker and VPS installs persist it in the browser_profile volume
local repo runs persist it under apps/api/data/browser-profile/default
the old tracked apps/api/default-data/browser-profile bundle has been removed

YouTube Workspace

The YouTube operator is live only when YT_ENGINE_URL is configured.

CLI self-host and production init flows write it automatically
custom local setups must point YT_ENGINE_URL at a reachable yt-engine instance

API Summary

All non-health backend routes are protected with x-api-key.

Core backend surfaces:

GET /api/health
GET/PATCH /api/config
GET /api/dashboard/stats
GET /api/logs
GET/POST/PATCH/DELETE /api/keys
proxy CRUD under /api/proxies
website operator routes under /api/operators/website/*
Google AI Search routes under /api/operators/google/ai-search/*
Tavily routes under /api/operators/tavily/*
Exa routes under /api/operators/exa/*
YouTube routes under /api/operators/youtube/*
queue job routes under /api/jobs/*
remote MCP endpoint at /mcp

See the full route reference in docs/api-endpoints.md.

MCP

HeadlessX exposes a remote MCP endpoint from the API:

http://localhost:38473/mcp

Use a normal API key created from the dashboard API Keys page.

Do not use DASHBOARD_INTERNAL_API_KEY for MCP clients.

Example client config:

{
  "mcpServers": {
    "headlessx": {
      "transport": "http",
      "url": "http://localhost:38473/mcp",
      "headers": {
        "x-api-key": "hx_your_dashboard_created_key"
      }
    }
  }
}

Monorepo Layout

apps/
  api/                    Express API + worker + MCP
  web/                    Next.js dashboard
  yt-engine/              Python YouTube engine
  go-html-to-md-service/  Go HTML-to-Markdown sidecar
docs/
  setup-guide.md
  api-endpoints.md
infra/docker/

Packages

Package	Description	Status
@headlessx-cli/core	Published CLI package for HeadlessX operators, jobs, and search workflows. Command: `headlessx`	Available
HeadlessX Agent Skills	Installable agent skill pack from this repository for Cursor, Claude Code, Warp, Windsurf, OpenCode, OpenClaw, Antigravity, and similar tools.	Available

Available

Package	Description	Status
headfox-js	Published TypeScript launcher and Playwright helper for Headfox, currently powered by Camoufox-compatible browser bundles.	Available

Coming Soon

Package	Description	Status
headfox	HeadlessX-maintained Firefox-based anti-detect browser engine that will power the platform's next-generation browser runtime.	Planned

Notes

The dashboard uses the internal dashboard key for server-side internal requests
MCP uses normal user-created API keys, not the dashboard internal key
Queue-backed features return degraded/unavailable behavior when Redis is missing
Docker support now covers the full runtime stack, including yt-engine

Contributing

See CONTRIBUTING.md for the current contribution workflow, local setup expectations, pull request guidance, and commit message conventions.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github		.github
.nx		.nx
apps		apps
assets		assets
docs		docs
infra		infra
packages		packages
scripts		scripts
skills/cli		skills/cli
.dockerignore		.dockerignore
.env.example		.env.example
.eslintignore		.eslintignore
.gitignore		.gitignore
.npmrc		.npmrc
.prettierrc.json		.prettierrc.json
.semgrepignore		.semgrepignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
audit-results.json		audit-results.json
biome.json		biome.json
knip.json		knip.json
mise.toml		mise.toml
nx.json		nx.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
project.json		project.json

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Self-hosted operators for website extraction, search, and agent workflows powered by Headfox JS and Camoufox

Overview

What Changed In v2.1.2

Sponsors

Operators

Coming Soon

Agent Surfaces Coming Soon

Agent Skills

UI Screenshots

Google AI Search (Recently Tested with Arabic Lang & Region)

Website

Proof

BrowserScan

Pixelscan

Proxy Validation

Quick Start

System Requirements

Runtime Dependencies

Practical Sizing Notes

CLI Bootstrap

Google AI Search First Run

YouTube Workspace

API Summary

MCP

Monorepo Layout

Packages

Available

Coming Soon

Notes

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages