Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
a700505
ci: test on node 22 and 24
B4nan May 17, 2025
6043086
refactor: convert to native ESM
B4nan May 19, 2025
ecd93ad
refactor: remove deprecated crawler options
B4nan May 20, 2025
08abcc6
refactor: make crawling context strict and remove the error fallback
B4nan May 20, 2025
c5d0085
refactor: remove `additionalBlockedStatusCodes` parameter of `Session…
B4nan May 20, 2025
708a6c3
refactor: remove `additionalBlockedStatusCodes` parameter of `Session…
B4nan May 20, 2025
eb8e8e1
chore: skip docker image builds for v4
B4nan May 20, 2025
2bd42e0
chore: use `v4` dist tag
B4nan May 20, 2025
cc409d7
chore: run tests on v4 branch
B4nan May 20, 2025
3ed420b
chore: fix build
B4nan May 20, 2025
6dc95b9
chore: fix v4 publishing
B4nan May 20, 2025
cf674d4
chore: use node 22 in e2e tests and project templates
B4nan May 21, 2025
9ad9343
chore: use node 24 in e2e tests and project templates
B4nan May 21, 2025
93a1faa
chore: improve types to get rid of some `as any`
B4nan Jun 4, 2025
0948c5c
chore: remove some deadcode
B4nan Jun 4, 2025
e6d7579
chore: bump a few more dependencies
B4nan Jun 11, 2025
03460f5
fix CLI
B4nan Jun 11, 2025
2618278
fix CLI 2
B4nan Jun 11, 2025
a80a302
fix: remove old system info implementation
B4nan Jun 11, 2025
6773146
chore: replace `lodash.isequal` with `util.isDeepStrictEqual`
B4nan Jun 17, 2025
2cbee01
refactor!: Introduce the `ContextPipeline` abstraction (#3119)
janbuchar Nov 24, 2025
c65c218
feat: store `ProxyInfo` inside `Session` instances (#3199)
barjin Nov 26, 2025
a409af2
feat!: use native `Request` / `Response` interface (#3163)
barjin Nov 27, 2025
c7899fb
chore: fix build errors from `master` rebase (#3285)
barjin Dec 1, 2025
ab98f68
chore: fix broken types in docs examples (#3287)
barjin Dec 1, 2025
665c690
chore(docs): Add (temporary) 4.0 docs snapshot (#3292)
janbuchar Dec 2, 2025
0b0a23e
chore: Fix HttpCrawler context types (#3291)
janbuchar Dec 2, 2025
e2c6784
fix: `KVS.getPublicUrl()` reads the public URL directly from storage …
barjin Dec 2, 2025
bca7d7a
feat: replace `got`-specific `HttpRequest` with native `Request` inte…
barjin Dec 5, 2025
9fc056d
feat: drop `gotOptions` params from the public interface (#3300)
barjin Dec 16, 2025
d6e2974
fix: patch `GotScrapingHttpClient` header handling and proxies (#3308)
barjin Dec 17, 2025
fbcfd32
feat: custom `httpClient` implementation for `@crawlee/utils` and `Re…
barjin Dec 18, 2025
91a7b4c
fix: log warning on sharing `useState()`, add `id` option to `BasicCr…
barjin Dec 18, 2025
72e82cd
fix: `sendRequest` uses correct `Session` instance and allows for ove…
barjin Dec 18, 2025
a7a6eb7
feat: `fetch` API for custom http clients (#3326)
barjin Jan 20, 2026
8cf173e
chore: drop `got-scraping` as the direct `crawlee` dependency (#3358)
barjin Jan 21, 2026
a2c7f23
feat: Zod-based Configuration class for cleaner SDK extension
B4nan Feb 4, 2026
13abb40
feat: add extendField helper for cleaner SDK extension
B4nan Feb 6, 2026
ca9e444
refactor: make extendField a static method on Configuration class
B4nan Feb 6, 2026
8d6e3d9
chore: mark extendField as @internal
B4nan Feb 6, 2026
d4139b7
chore: bump zod to v4
B4nan Feb 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
steps:
- uses: actions/checkout@v6

- name: Use Node.js 20
- name: Use Node.js 24
uses: actions/setup-node@v6
with:
node-version: 24
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-to-npm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ jobs:
- name: Bump canary versions
if: inputs.dist-tag == 'next'
run: |
yarn turbo copy --force -- --canary --preid=beta
yarn turbo copy --force -- --canary=major --preid=beta

- name: Commit changes
if: inputs.dist-tag == 'next'
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
matrix:
# We don't test on Windows as the tests are flaky
os: [ ubuntu-22.04 ]
node-version: [ 18, 20, 22, 24 ]
node-version: [ 22, 24 ]

runs-on: ${{ matrix.os }}

Expand Down Expand Up @@ -95,7 +95,7 @@ jobs:
token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
fetch-depth: 0

- name: Use Node.js 20
- name: Use Node.js 24
uses: actions/setup-node@v6
with:
node-version: 24
Expand All @@ -106,7 +106,7 @@ jobs:
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 24
uses: actions/setup-node@v6
with:
cache: 'yarn'
Expand Down Expand Up @@ -189,7 +189,7 @@ jobs:
token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
fetch-depth: 0

- name: Use Node.js 20
- name: Use Node.js 24
uses: actions/setup-node@v6
with:
node-version: 24
Expand All @@ -203,7 +203,7 @@ jobs:
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 24
uses: actions/setup-node@v6
with:
cache: 'yarn'
Expand Down
54 changes: 27 additions & 27 deletions .github/workflows/test-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Check

on:
push:
branches: [ master, renovate/** ]
branches: [ master, v4, renovate/** ]
pull_request:
branches: [ master ]
branches: [ master, v4 ]

env:
YARN_IGNORE_NODE: 1
Expand All @@ -23,7 +23,7 @@ jobs:
# tests on windows are extremely unstable
# os: [ ubuntu-22.04, windows-2019 ]
os: [ ubuntu-22.04 ]
node-version: [ 18, 20, 22, 24 ]
node-version: [ 22, 24 ]

steps:
- name: Cancel Workflow Action
Expand Down Expand Up @@ -97,7 +97,7 @@ jobs:
- name: Checkout Source code
uses: actions/checkout@v6

- name: Use Node.js 20
- name: Use Node.js 24
uses: actions/setup-node@v6
with:
node-version: 24
Expand All @@ -108,7 +108,7 @@ jobs:
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 24
uses: actions/setup-node@v6
with:
cache: 'yarn'
Expand Down Expand Up @@ -142,7 +142,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v6

- name: Use Node.js 20
- name: Use Node.js 24
uses: actions/setup-node@v6
with:
node-version: 24
Expand All @@ -153,7 +153,7 @@ jobs:
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 24
uses: actions/setup-node@v6
with:
cache: 'yarn'
Expand All @@ -178,7 +178,7 @@ jobs:

release_next:
name: Release @next
if: github.event_name == 'push' && contains(github.event.ref, 'master') && (!contains(github.event.head_commit.message, '[skip ci]') && !contains(github.event.head_commit.message, 'docs:'))
if: github.event_name == 'push' && contains(github.event.ref, 'v4') && (!contains(github.event.head_commit.message, '[skip ci]') && !contains(github.event.head_commit.message, 'docs:'))
needs: build_and_test
runs-on: ubuntu-22.04

Expand Down Expand Up @@ -240,22 +240,22 @@ jobs:
"dist-tag": "next"
}

- name: Collect versions for Docker images
id: versions
run: |
crawlee=`node -p "require('./packages/crawlee/package.json').version"`
echo "crawlee=$crawlee" | tee -a $GITHUB_OUTPUT

- name: Trigger Docker image builds
uses: peter-evans/repository-dispatch@v4
# Trigger next images only if we have something new pushed
if: steps.changed-packages.outputs.changed_packages != '0'
with:
token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
repository: apify/apify-actor-docker
event-type: build-node-images
client-payload: >
{
"crawlee_version": "${{ steps.versions.outputs.crawlee }}",
"release_tag": "beta"
}
# - name: Collect versions for Docker images
# id: versions
# run: |
# crawlee=`node -p "require('./packages/crawlee/package.json').version"`
# echo "crawlee=$crawlee" | tee -a $GITHUB_OUTPUT
#
# - name: Trigger Docker image builds
# uses: peter-evans/repository-dispatch@v4
# # Trigger next images only if we have something new pushed
# if: steps.changed-packages.outputs.changed_packages != '0'
# with:
# token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
# repository: apify/apify-actor-docker
# event-type: build-node-images
# client-payload: >
# {
# "crawlee_version": "${{ steps.versions.outputs.crawlee }}",
# "release_tag": "beta"
# }
4 changes: 2 additions & 2 deletions .github/workflows/test-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v6

- name: Use Node.js 20
- name: Use Node.js 24
uses: actions/setup-node@v6
with:
node-version: 24
Expand All @@ -40,7 +40,7 @@ jobs:
corepack enable
corepack prepare yarn@stable --activate

- name: Activate cache for Node.js 20
- name: Activate cache for Node.js 24
uses: actions/setup-node@v6
with:
cache: 'yarn'
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/file_download.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ import { FileDownload } from 'crawlee';

// Create a FileDownload - a custom crawler instance that will download files from URLs.
const crawler = new FileDownload({
async requestHandler({ body, request, contentType, getKeyValueStore }) {
async requestHandler({ request, response, contentType, getKeyValueStore }) {
const url = new URL(request.url);
const kvs = await getKeyValueStore();

await kvs.setValue(url.pathname.replace(/\//g, '_'), body, { contentType: contentType.type });
await kvs.setValue(url.pathname.replace(/\//g, '_'), response.body, { contentType: contentType.type });
},
});

Expand Down
39 changes: 17 additions & 22 deletions docs/examples/file_download_stream.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,32 +23,27 @@ function createProgressTracker({ url, log, totalBytes }: { url: URL; log: Log; t

// Create a FileDownload - a custom crawler instance that will download files from URLs.
const crawler = new FileDownload({
async streamHandler({ stream, request, log, getKeyValueStore }) {
async requestHandler({ response, request, log, getKeyValueStore }) {
const url = new URL(request.url);

log.info(`Downloading ${url} to ${url.pathname.replace(/\//g, '_')}...`);

await new Promise<void>((resolve, reject) => {
// With the 'response' event, we have received the headers of the response.
stream.on('response', async (response) => {
const kvs = await getKeyValueStore();
await kvs.setValue(
url.pathname.replace(/\//g, '_'),
pipeline(
stream,
createProgressTracker({ url, log, totalBytes: Number(response.headers['content-length']) }),
(error) => {
if (error) reject(error);
},
),
{ contentType: response.headers['content-type'] },
);

log.info(`Downloaded ${url} to ${url.pathname.replace(/\//g, '_')}.`);

resolve();
});
});
if (!response.body) return;

const kvs = await getKeyValueStore();
await kvs.setValue(
url.pathname.replace(/\//g, '_'),
pipeline(
response.body,
createProgressTracker({ url, log, totalBytes: Number(response.headers.get('content-length')) }),
(error) => {
if (error) log.error(`Failed to download ${url}: ${error.message}`);
},
),
response.headers.get('content-type') ? { contentType: response.headers.get('content-type')! } : {},
);

log.info(`Downloaded ${url} to ${url.pathname.replace(/\//g, '_')}.`);
},
});

Expand Down
13 changes: 9 additions & 4 deletions docs/examples/skip-navigation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,15 @@ const crawler = new PlaywrightCrawler({
// The request should have the navigation skipped
if (request.skipNavigation) {
// Request the image and get its buffer back
const imageResponse = await sendRequest({ responseType: 'buffer' });

// Save the image in the key-value store
await imageStore.setValue(`${request.userData.key}.png`, imageResponse.body);
const imageResponse = await sendRequest();

// Saves the image in the key-value store.
//
// Note: For large-scale file downloads, consider using FileDownload crawler:
// https://crawlee.dev/js/api/http-crawler/class/FileDownload
await imageStore.setValue(`${request.userData.key}.svg`, await imageResponse.bytes(), {
contentType: 'image/svg+xml',
});

// Prevent executing the rest of the code as we do not need it
return;
Expand Down
95 changes: 0 additions & 95 deletions docs/experiments/systemInfoV2.mdx

This file was deleted.

1 change: 0 additions & 1 deletion docs/guides/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,6 @@ Storage directories are purged by default. If set to `false` - local storage dir

#### `CRAWLEE_CONTAINERIZED`

This variable is only effective when the systemInfoV2 experiment is enabled.
Changes how crawlee measures its CPU and Memory usage and limits. If unset, crawlee will determine if it is containerised using common features of containerized environments using the `isContainerized` utility function.
- A file at `/.dockerenv`.
- A file at `/proc/self/cgroup` containing `docker`.
Expand Down
Loading
Loading