A self-hosted captcha solver for browser automation. Give it a page and it finds the captcha, reads the image grid with a fine-tuned Qwen3.5-9B vision model, and clicks through to a token. Everything runs on your hardware — no third-party solving service involved.
⭐ Enjoying CaptchaKraken? Star both repos — and watch them for updates (smaller models, the hosted cloud API, new puzzle types): CaptchaKrakenJS (this solver) · CaptchaKraken-cli (the detection + grid-planner engine). On GitHub: Star = ⭐ top-right, Watch = 👁 Watch → All Activity for release notifications.
⚠️ v2 is a breaking change. It's a full rewrite. The old multi-provider setup (Gemini / OpenRouter / Ollama) is gone — v2 runs one purpose-built grid model on a local vLLM server. Upgrading from v1? See Migrating from v1.
CaptchaKraken handles image-grid captchas — the "select all squares with…" challenges. It detects the captcha, solves the grid, clicks, and verifies.
| Captcha type | Status |
|---|---|
| ✅ Checkbox / "I'm not a robot" | Works end-to-end |
| ✅ reCAPTCHA 3×3 (dynamic) | Works end-to-end |
| ✅ reCAPTCHA 4×4 (one-shot) | Works end-to-end |
| ✅ hCaptcha 3×3 image grid | Works end-to-end |
| ✅ Cloudflare Turnstile | Works via the checkbox flow |
How accurate is the model? On our hand-labeled real captcha set, the model picks the exactly-correct tiles 94.7% of the time for reCAPTCHA 3×3 (86.7% for hCaptcha, 76.2% for the harder 4×4 grid — 85.8% overall).
What about real solve rates in a browser? They vary a lot depending on your IP reputation and browser setup (see below). On a standard setup, expect roughly 50% end-to-end for reCAPTCHA. A clean IP and good stealth browser do better; a flagged IP does much worse — providers reject even correct answers once they distrust your IP.
Live solves, recorded straight from the browser:
reCAPTCHA 3×3 — fast solve (~14 s)
recaptcha_3x3_demo_1.mp4
reCAPTCHA 3×3 — multi-round refresh
recaptcha_3x3_demo_2.mp4
reCAPTCHA 4×4 — one-shot "select all"
recaptcha_4x4_demo_1.mp4
hCaptcha 3×3 grid — "select all" property puzzle (~15 s)
hcaptcha_solved_3.mp4
hCaptcha 3×3 grid — another property solve (~14 s)
hcaptcha_solved_2.mp4
hCaptcha 3×3 grid — property solve (~14 s)
hcaptcha_solved_1.mp4
hCaptcha also serves non-grid puzzles (drag, path, "choose the card", etc.). CaptchaKraken detects and skips these instead of guessing — they're on the roadmap.
💡 No GPU? A hosted cloud API (no model to run) is coming — star the repo to be notified. For now CaptchaKraken runs on your own GPU or Apple-silicon machine.
bash install.shThis checks your available memory, picks the right model size, downloads it plus
the grid model, and writes a captchakraken.env config file.
| Your memory | Model it picks |
|---|---|
| ≥ 22 GB | Qwen3.5-9B-FP8-dynamic (8-bit) — best accuracy |
| 11–22 GB | Qwen3.5-9B-AWQ-4bit (4-bit) — lighter, slightly less accurate |
| < 11 GB | Too small to run — the installer stops and explains your options |
If your hardware is too small, you can still bash install.sh --download-only
(e.g. to copy the model to a bigger server later), or watch the repo for smaller
models and the cloud API. Force a size with bash install.sh --quant fp8|awq.
Solve time is dominated by how fast your hardware generates tokens, and LLM
generation is memory-bandwidth bound — each token streams the model's weights
through memory once. So speed tracks your GPU/SoC memory bandwidth, not its
capacity: roughly tokens/sec ≈ bandwidth × ~50% ÷ bytes-per-token, where the
8-bit (FP8) model reads ~9 GB/token and the 4-bit (AWQ) ~4.5 GB/token. (Measured
on a 5090: ~100 tok/s on FP8, ~200 tok/s on AWQ — both match this formula.)
Estimated throughput for common devices:
| Device | Memory bandwidth | FP8 (8-bit) | AWQ (4-bit) |
|---|---|---|---|
| NVIDIA H100 | ~3.35 TB/s | ~186 | ~370 |
| NVIDIA A100 | ~2.0 TB/s | ~111 | ~222 |
| NVIDIA RTX 5090 | ~1.79 TB/s | ~100 | ~200 |
| NVIDIA RTX 4090 | ~1.0 TB/s | ~56 | ~112 |
| NVIDIA RTX 5080 / 3090 | ~0.95 TB/s | ~53 | ~105 |
| NVIDIA RTX 5070 Ti | ~896 GB/s | ~50 | ~100 |
| AMD RX 7900 XTX | ~960 GB/s | ~53 | ~106 |
| NVIDIA RTX 3080 / 4080 | ~0.7–0.76 TB/s | ~40 | ~80 |
| Apple M2 / M3 Ultra | ~800 GB/s | ~44 | ~88 |
| NVIDIA RTX 5070 | ~672 GB/s | ~37 | ~74 |
| Apple M5 Max | ~614 GB/s | ~34 | ~68 |
| Apple M4 Max | ~546 GB/s | ~30 | ~60 |
| NVIDIA RTX 4070 | ~504 GB/s | ~28 |
~56 |
| Apple M1–M3 Max | ~400 GB/s | ~22 |
~44 |
| Apple M5 Pro | ~307 GB/s | ~17 |
~34 |
| Apple M4 Pro | ~273 GB/s | ~15 |
~30 |
| Apple M5 (base) | ~154 GB/s | ~8 |
~17 |
(Rough estimates at ~50% bandwidth efficiency; real numbers vary with batching, KV-cache length, and driver. AWQ is faster but slightly less accurate.) Sources: NVIDIA / AMD / Apple spec sheets.
Cards that comfortably self-host (≥ 30 tok/s): NVIDIA 5090 · 5080 · 5070 Ti · 5070 · 4090 · 4080(Ti) · 3090 · 3080; AMD 7900 XTX / 7900 XT; Apple Ultra chips and the M4/M5 Max. Most other cards fall under ~30 tok/s on the 8-bit model — 4070 and below, older Apple Max chips (fine on the 4-bit model), Apple Pro/base laptops, and older mid-range AMD.
⏳ Below ~30 tokens/sec, self-hosting feels sluggish. If that's your card, consider the upcoming hosted cloud API instead — it serves the 8-bit model at ~100 tokens/sec with no GPU to run. ⭐ Star the repo to be notified when it launches.
install.shestimates your speed from your device's bandwidth (NVIDIA / AMD / Apple) and flags this automatically.
install.sh prints the exact vllm serve … command for your model. Keep the
--enable-tower-connector-lora flag — without it the vision part of the model is
dropped and accuracy falls apart.
source captchakraken.env
export VLLM_API_KEY="$CAPTCHA_KRAKEN_API_KEY" # server bearer == solver key
vllm serve "RedHatAI/Qwen3.5-9B-FP8-dynamic" \
--reasoning-parser qwen3 \
--enable-lora --enable-tower-connector-lora \
--max-lora-rank 64 --max-model-len 65536 \
--gpu-memory-utilization 0.80 --trust-remote-code \
--port 8000 \
--lora-modules captcha-grid=JobHarvest/qwen3.5-9b-grid-loraThe solver only needs two environment variables (both written by
install.sh into captchakraken.env):
| Variable | Meaning |
|---|---|
VLLM_BASE_URL |
Inference endpoint — your local vLLM server, or the hosted cloud endpoint when it launches. |
CAPTCHA_KRAKEN_API_KEY |
Bearer token — your server's key today, your account key on the cloud API later. |
npm install captcha-kraken-jsCaptchaKraken does not launch the browser for you — you bring your own browser and hand the solver a page. Install whichever automation framework you prefer alongside it. The solver itself ships no browser dependency.
| Framework | Install | Pass to solve() |
|---|---|---|
| Playwright (vanilla) | npm i playwright |
the page directly |
| Patchright (stealth Chromium) | npm i patchright |
the page directly |
| camoufox-js (Firefox stealth) | npm i camoufox-js |
the page directly |
| Puppeteer | npm i puppeteer |
fromPuppeteer(page) |
The first three are Playwright-compatible (they return a standard Playwright
Page), so you pass the page straight in. Puppeteer's API differs slightly, so
wrap its page once with fromPuppeteer(). All four are tested end-to-end against
the live reCAPTCHA demo.
In every example the solver reads VLLM_BASE_URL + CAPTCHA_KRAKEN_API_KEY from
the environment (run install.sh, then source captchakraken.env), defaults to
the published grid LoRA, and solve() does detect → grid → click → verify.
import { chromium } from 'playwright';
import { CaptchaKrakenSolver } from 'captcha-kraken-js';
const browser = await chromium.launch({ headless: false });
const page = await (await browser.newContext()).newPage();
await page.goto('https://www.google.com/recaptcha/api2/demo');
const solver = new CaptchaKrakenSolver();
await solver.solve(page);
await browser.close();Drop-in for vanilla Playwright — same API, just a stealthier Chromium.
import { chromium } from 'patchright';
import { CaptchaKrakenSolver } from 'captcha-kraken-js';
const browser = await chromium.launch({ headless: false });
const page = await (await browser.newContext()).newPage();
await page.goto('https://www.google.com/recaptcha/api2/demo');
const solver = new CaptchaKrakenSolver();
await solver.solve(page);
await browser.close();camoufox-js exposes a Camoufox()
launcher that returns a standard Playwright Browser. (On first run it may prompt
you to fetch its Firefox build: npx camoufox-js fetch.)
import { Camoufox } from 'camoufox-js';
import { CaptchaKrakenSolver } from 'captcha-kraken-js';
const browser = await Camoufox({ headless: false });
const page = await (await browser.newContext()).newPage();
await page.goto('https://www.google.com/recaptcha/api2/demo');
const solver = new CaptchaKrakenSolver();
await solver.solve(page);
await browser.close();Puppeteer isn't Playwright-API-compatible, so wrap its page with the bundled
fromPuppeteer() adapter. Drive the raw Puppeteer page as usual (navigate,
etc.); only the object you hand solve() needs wrapping.
import puppeteer from 'puppeteer';
import { CaptchaKrakenSolver, fromPuppeteer } from 'captcha-kraken-js';
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.google.com/recaptcha/api2/demo');
const solver = new CaptchaKrakenSolver();
await solver.solve(fromPuppeteer(page));
await browser.close();That's it — no model name to pass, no provider to choose, and no browser locked in. The solver defaults to the published grid LoRA and the endpoint from your env, and works with whatever browser you handed it.
If you're cloning instead of installing from npm, initialize the submodule that holds the detection/planner CLI:
git submodule update --init --recursive
npm install # builds the solver + a local CLI venv (postinstall)
⚠️ Solving many captchas fast from one IP lowers your success rate. This is normal anti-abuse behavior — once a provider distrusts an IP, it rejects submissions even when the answer is correct and serves harder challenges.
CaptchaKraken only produces the answer. Managing your IP reputation is your job. In production you'll usually want to:
- Use rotating / residential proxies instead of one IP.
- Space out requests — avoid rapid bursts.
- Rotate the IP when you notice correct answers being rejected or challenges getting harder, rather than retrying on the same one.
This only affects whether the provider accepts a solve — it doesn't change the model's accuracy.
browser ─▶ detect captcha ─▶ screenshot frame
│
OpenCV find_grid (color-agnostic line tracer)
│ tile boxes
▼
Qwen3.5-9B grid LoRA on vLLM ─▶ tile selection
│
click plan ─▶ execute (human-like) ─▶ re-detect / verify
find_gridfinds the grid lines with plain OpenCV — no model needed. It's in theCaptchaKraken-clisubmodule.- The grid model runs on your local vLLM server and says which tiles to click.
- The solver (this repo) drives the browser: it clicks, waits for reCAPTCHA's refreshing tiles, and keeps going until the captcha is solved.
- ☁️ Hosted cloud API — solve over HTTP, no GPU required.
- 🪶 Smaller / faster quantizations so lower-VRAM hardware can self-host.
- 🧩 Non-grid hCaptcha puzzles — drag-and-drop, path, tetris-fit, and the other types currently detected-and-skipped.
- 🎯 reCAPTCHA 4×4 robustness — our weakest grid type end-to-end.
- 📈 More real labeled data for under-represented prompts.
📣 Watch the repos to hear about these as they ship, and ⭐ star if the project is useful to you — it genuinely helps: CaptchaKrakenJS (the solver) and CaptchaKraken-cli (detection + grid planner). Use GitHub's Watch → All Activity for release notifications.
install.sh hardware-gated one-command setup
LICENSE source-available (see "License")
CONTRIBUTING.md how to contribute + dev setup
src/ the browser solver (TypeScript)
tests/record_demos.spec.ts live-solve recorder (numbers + demos)
CaptchaKraken-cli/ find_grid + vLLM grid planner (Python submodule)
- The
apiProvider/model/apiKeyoptions for Gemini/OpenRouter/Ollama are removed. v2 talks only to a vLLM server. - Set
VLLM_BASE_URLandCAPTCHA_KRAKEN_API_KEY(or runinstall.sh) instead of provider API keys. new CaptchaKrakenSolver()now needs no model/provider — it defaults to the grid LoRA.- v1's
transformers/torch/ SAM3 dependencies are gone from the solver venv; they live on thev1-old-architecturebranch if you need them.
PRs welcome — see CONTRIBUTING.md. CI runs hermetic grid-detection tests + a TypeScript build on every PR (no GPU/network). Release notes are in CHANGELOG.md.
Source-available under the CaptchaKraken Source-Available License — see LICENSE. In short:
- ✅ Allowed: personal use, research, and commercial use inside a larger product that adds value beyond captcha solving itself — web scrapers, stealth browsers, data-collection pipelines, QA tooling.
- ⛔ Not allowed: selling captcha-solving as a service, "thin wrapper" products (browser extensions, hosted endpoints, CLIs) whose main purpose is solving, or relaying the model's outputs through a paid solving API.
Build with it; don't sell the solve. For prohibited uses, contact us about a commercial license.
⚠️ Use responsibly and lawfully — respect the terms of service of any site you interact with. This project is for legitimate automation, research, and testing.