Skip to content

chore: sync upstream firecrawl/main (2026-05-11)#7

Open
agenticassets wants to merge 33 commits into
mainfrom
chore/sync-upstream-2026-05-11
Open

chore: sync upstream firecrawl/main (2026-05-11)#7
agenticassets wants to merge 33 commits into
mainfrom
chore/sync-upstream-2026-05-11

Conversation

@agenticassets
Copy link
Copy Markdown
Owner

Summary

  • merge upstream firecrawl/main into the Agentic Assets fork as of 2026-05-11
  • preserve fork-specific self-hosted ops overlays on top of the merged upstream baseline
  • isolate the sync in a fresh branch/worktree because the local checkout on main has uncommitted changes

Upstream highlights reviewed

  • adds the new monitoring orchestration stack in apps/api plus related queue, scheduler, diff, results, webhook, and notification support
  • adds deprecation warnings/proxy support work in the API and propagates new monitor/query/highlights/question types across SDKs
  • includes scrape/query and YouTube live handling fixes plus browser-session retry hardening
  • adds npm audit remediation workflow/scripts and dependency/security updates

Validation

  • merge completed cleanly with no manual conflict resolution
  • no repo-local validation commands were run because this PR is a pure upstream sync merge; CI should verify the integrated tree

Notes

mogery and others added 30 commits May 4, 2026 17:55
…m params

- Add Firecrawl.Error exception module for API error responses
- Add Req response step that converts HTTP 4xx/5xx to {:error, %Firecrawl.Error{}}
  for standard functions and raises for bang (!) variants
- Update enum params (model, sitemap) to accept both atoms and strings
  via {:or, [{:in, [atoms...]}, :string]} type
- Update generate.exs to produce these patterns on re-generation
- Add tests for error handling and string enum acceptance

Co-Authored-By: gaurav <gauravchadha1676@gmail.com>
…vements

- Add ScreenshotFormat model class for full-page screenshot options
  (fullPage, quality) alongside existing string format names
- Add minAge parameter to ScrapeOptions for cache-only lookups
- Add profile parameter to ScrapeOptions for persistent browser storage
- Add changeTracking parameter to ScrapeOptions
- Add prompt parameter to interact() method for AI-powered browser automation
- Add interactiveLiveViewUrl property to BrowserCreateResponse
- Add profile parameter to browser() session creation method
- Maintain backward compatibility for existing positional argument usage

Co-Authored-By: gaurav <gauravchadha1676@gmail.com>
…enum tests

- Guard error message interpolation against non-string values
- Enum string tests now assert the error is NOT a NimbleOptions.ValidationError,
  proving validation actually accepts strings

Co-Authored-By: gaurav <gauravchadha1676@gmail.com>
…xir-sdk-error-handling

fix(elixir-sdk): proper error handling for API responses and string enum support
…-sdk-fixes

fix(php-sdk): add ScreenshotFormat, missing API params, and browser improvements
The secrets context is not allowed in step-level if conditions.
Move FIRECRAWL_API_KEY to job-level env and check env.FIRECRAWL_API_KEY
in the E2E test step condition instead.

Co-Authored-By: gaurav <gauravchadha1676@gmail.com>
Use a boolean HAS_API_KEY env var at job level for the condition check,
and keep the actual FIRECRAWL_API_KEY secret scoped to just the E2E step.

Co-Authored-By: gaurav <gauravchadha1676@gmail.com>
…-php-sdk-ci

fix(ci): fix PHP SDK workflow - use env context instead of secrets in step if condition
…recrawl#3487)

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Adds multipart/form-data support to the OpenAPI generator and exposes
parse_file/parse_file! in the Elixir SDK, bringing it to parity with
the other SDKs.
The non-bang parse_file/3 now returns {:error, %ArgumentError{}} for
invalid file inputs (empty filename, nil data) instead of raising,
matching the documented {:error, exception} contract.

The bang variant parse_file!/3 continues to raise as expected.

Co-Authored-By: gaurav <gauravchadha1676@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…se errors (firecrawl#3489)

Co-authored-by: firecrawl-spring[bot] <254786068+firecrawl-spring[bot]@users.noreply.github.com>
Co-authored-by: mogery <mogery@sideguide.dev>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The directQuote query mode uses a Fireworks model that is not available
in self-hosted CI environments (only OPENAI_API_KEY is set). This caused
all 4 self-hosted test matrix jobs to fail. Gate the test behind
TEST_PRODUCTION so it only runs where Fireworks is available.

Co-Authored-By: gaurav <gauravchadha1676@gmail.com>
Co-Authored-By: gaurav <gauravchadha1676@gmail.com>
feat(elixir-sdk): add parse_file for /parse endpoint
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…l#3490)

* feat(api): add proxy routes for support-agent ask endpoints

Forwards /v2/support/ask and /v2/support/docs-search to the
support-agent service (SUPPORT_AGENT_URL env var). No auth middleware
on the proxy — the support-agent validates the bearer itself.

This lets callers use a single api.firecrawl.dev base URL for all
Firecrawl endpoints instead of needing to know about ask.firecrawl.dev.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(env): add SUPPORT_AGENT_URL to .env.example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(proxy): forward content-type from upstream response

Without this, Express defaults to text/html for string bodies,
breaking JSON parsing in client libraries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(proxy): always set content-type to application/json on upstream request

Body is always JSON.stringify'd, so forwarding the client's original
content-type (e.g. application/x-www-form-urlencoded) would cause a
mismatch that breaks upstream parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(proxy): add auth middleware to /support/ask route

Adds rate-limited auth check for consistency with other v2 endpoints.
docs-search remains public (no auth) as designed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(proxy): add dedicated Support rate limit (5 req/hour)

Adds RateLimiterMode.Support with a 3600s window and 5-request cap.
Applied to both /support/ask and /support/docs-search routes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(proxy): change support rate limit to 3 req/min

Simpler and consistent with other modes using the default 60s window.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(proxy): split support rate limits into independent buckets

Each endpoint gets its own 3 req/min limit so they don't share a bucket.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(rate-limiter): remove unnecessary duration override

Reverts createRateLimiter back to its original signature. Support
endpoints use the same 60s window as everything else.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(proxy): add Vercel deployment protection bypass header

Adds SUPPORT_AGENT_VERCEL_BYPASS_SECRET env var. When set, the proxy
includes x-vercel-protection-bypass header on upstream requests to
ash.firecrawl.dev.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(api): deprecation warnings on legacy endpoints

Add a deprecation middleware driven by a typed registry that emits
RFC 9745 Deprecation/Sunset headers and injects warning + replacement
fields into JSON responses. Wired into v1 extract, deep-research,
llmstxt and v2 extract.

* chore(api): drop HTTP method prefix from deprecation messages

Path is unambiguous on its own.

* test(api): use crypto.randomUUID in deprecation test

uuid@13 is ESM-only and Jest cannot parse its export syntax under the
current ts-jest config, causing the test suite to fail to load. Replace
the direct uuid import with the global crypto.randomUUID(), matching
the convention used by other snips tests.

* feat(api): emit deprecation warnings via warnings[] and standard headers

Switch the deprecation middleware response shape from a single
warning string to a warnings array so multiple notices can coexist
with controller-emitted warnings. Surface the same notice in standard
HTTP headers: Warning: 299 (RFC 7234) for the human-readable text and
Link rel="successor-version" (RFC 5829 / RFC 8288) for the replacement
endpoint, alongside the existing Deprecation (RFC 9745) and Sunset
(RFC 8594) headers.

* feat(api): deprecation warnings on legacy v0 endpoints

Wire deprecationMiddleware to /v0/scrape, /v0/crawl,
/v0/crawl/status/:jobId, /v0/crawl/cancel/:jobId, and /v0/search.
Each points clients at its v2 successor via the standard Deprecation,
Warning, and Link rel="successor-version" headers and the warnings[]
body field. Health probes and /v0/keyAuth are left untouched.
…ecrawl#3491)

* feat(sdk): surface deprecation warnings[] and replacement fields

Mirror the API deprecation contract from firecrawl#3469 in the JS and Python
SDKs. Add optional warnings[] and replacement fields to the response
types for the v1 extract, deep-research, and llmstxt endpoints, plus
v2 extract, so callers can read the structured deprecation notice the
API now returns.

Mark the SDK methods that hit those endpoints as deprecated using each
language's idiomatic doc comment, and add runtime DeprecationWarning
calls on the Python deep-research and llmstxt methods (sync + async)
for parity with the existing extract methods.

* chore(sdk): bump js-sdk to 4.22.2 and python-sdk to 4.25.2
tomsideguide and others added 3 commits May 7, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants