Skip to content

feat(browser): human-like delay system for anti-detection#297

Open
toolmanlab wants to merge 2 commits intojackwener:mainfrom
toolmanlab:feat/human-delay-jitter
Open

feat(browser): human-like delay system for anti-detection#297
toolmanlab wants to merge 2 commits intojackwener:mainfrom
toolmanlab:feat/human-delay-jitter

Conversation

@toolmanlab
Copy link

Summary

Adds a framework-level human-like delay/jitter system to reduce bot detection when scraping cookie-authenticated sites. Addresses #59 (P0: request interval jitter + rate limiting).

  • HumanDelay class with log-normal distribution — realistic delay variance, not uniform random
  • 5 configurable profiles: none / fast / moderate / cautious / stealth
  • Periodic breaks that simulate reading/thinking pauses (configurable interval + duration)
  • Auto-injected between page.goto() navigations (skips the first one)
  • OPENCLI_DELAY_PROFILE env var for profile selection (default: moderate)
  • Boss search adapter migrated from hardcoded 1000 + Math.random() * 2000 to framework delay
  • 10 unit tests covering all profiles, bounds, breaks, reset, and distribution variance

Delay profiles

Profile Median delay Range Breaks
none 0
fast ~1s 0.5-3s none
moderate ~2s 1-8s every 15-25 actions, 5-15s
cautious ~4s 3-25s every 8-12 actions, 15-60s
stealth ~8s 5-40s every 6-10 actions, 30-90s

Why log-normal?

Uniform random (Math.random() * N) produces easily detectable flat distributions. Log-normal clusters most delays near the median with occasional longer pauses — matching how humans actually browse.

Real-world validation

Tested against a major job board with aggressive bot detection (cookie-authenticated, rate-limited API):

Scenario Without jitter With jitter
50 detail pages ✅ OK ✅ OK
200 detail pages ❌ Banned (code 32) ✅ OK
850 requests over 5h N/A (banned early) ✅ Zero detection
4-day sustained crawl N/A ✅ 1800+ records

Test plan

  • npx vitest run src/human-delay.test.ts — 10/10 passed
  • npx tsc --noEmit — zero errors
  • Multi-day production validation on cookie-authenticated site

🤖 Generated with Claude Code

Adds a framework-level delay/jitter system using log-normal distribution
to simulate natural browsing patterns, addressing issue jackwener#59 (P0).

- New `HumanDelay` class with configurable profiles (none/fast/moderate/cautious/stealth)
- Log-normal distribution for realistic delay variance (not uniform)
- Periodic "breaks" that simulate reading/thinking pauses
- Auto-injected between page.goto() navigations
- Configurable via OPENCLI_DELAY_PROFILE env var
- Boss search adapter migrated from hardcoded jitter to framework delay
- 10 unit tests covering all profiles and edge cases

Real-world validation against a major job board (cookie-authenticated,
aggressive bot detection):

| Scenario              | Without jitter     | With jitter        |
|-----------------------|--------------------|--------------------|
| 50 detail pages       | ✅ OK              | ✅ OK              |
| 200 detail pages      | ❌ Banned (code 32) | ✅ OK              |
| 850 requests over 5h  | N/A (banned early) | ✅ Zero detection   |
| 4-day sustained crawl | N/A                | ✅ 1800+ records    |

Closes jackwener#59

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@toolmanlab
Copy link
Author

CI note: The two failing checks are pre-existing issues on main, unrelated to this PR:

  • docs-build: Dead link /adapters/desktop/doubao-app in adapters/index.md (doubao-app adapter was added without a matching doc page)
  • e2e-headed: apple-podcasts top test timeout (flaky external dependency, 93/94 tests passed)

This PR only touches src/human-delay.ts, src/browser/page.ts, and src/clis/boss/search.ts. All build, unit-test (4 shards), and audit checks pass ✅.

In CI environments (CI=true), resolveProfile() now defaults to the
'none' profile instead of 'moderate'. This prevents the 1-8s per-
navigation delay from causing E2E test timeouts (30s limit).

Users can override this by setting OPENCLI_DELAY_PROFILE explicitly.
Copy link
Contributor

@Astro-Han Astro-Han left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughtful approach to a real problem (issue #59). The log-normal idea and profile system are well-designed concepts. One blocker and several design points:

Tests will fail in CI

resolveProfile() returns none when process.env.CI is truthy, but the test asserts the default is moderate. GitHub Actions sets CI=true, so this test will red on merge.

Log-normal mu / 1000 — distribution doesn't match profiles

_lognormalDelay uses exp(mu / 1000 + sigma * z) * 1000. Dividing mu by 1000 compresses all profile medians to ~1008ms — the five profiles become nearly identical, with differences coming from minMs/maxMs clamping rather than the log-normal shape. Likely intended: exp(mu + sigma * z) without /1000.

Consider opt-in rather than global default

The delay injects in Page.goto() for all 83 browser adapters. Most sites (Wikipedia, HackerNews, YouTube) don't need anti-detection. A delayProfile field on CliCommand (opt-in per adapter) might be less surprising than a global moderate default that silently slows every command.

Timeout interaction

cautious/stealth breaks (15-90s) execute inside the command timeout budget (default 60s) and can consume it entirely. Worth noting in docs or making the delay timeout-aware.

CDP mode gap

CDPPage.goto() doesn't have the delay — only Extension/Standalone mode does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants