`ContextPipeline` changes break `skipNavigation` with `CheerioCrawler`

The following snippet works with Crawlee v3, but will break on current `v4`:

```typescript
import { CheerioCrawler } from "@crawlee/cheerio";

const crawler = new CheerioCrawler({
    requestHandler: async () => {
        // pass
    },
});

await crawler.run([{
    url: 'http://example.com',
    skipNavigation: true,
}]);
```

```text
INFO  CheerioCrawler: Starting the crawler.
WARN  CheerioCrawler: Reclaiming failed request back to the list or queue. The `contentType` property is not available - `skipNavigation` was used
    at get contentType (file:///home/jindrichbar/Desktop/apify/crawlee/packages/http-crawler/dist/internals/http-crawler.js:207:27) {"id":"8OamqXBCpPHxyH9","url":"http://example.com","retryCount":1}
ERROR CheerioCrawler: An exception occurred during handling of failed request. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. 
  The `request.loadedUrl` property is not available - `skipNavigation` was used
      at Object.get (file:///home/jindrichbar/Desktop/apify/crawlee/packages/http-crawler/dist/internals/http-crawler.js:177:35)
      at Function.entries (<anonymous>)
      at _ObjectValidator.handleIgnoreStrategy (file:///home/jindrichbar/Desktop/apify/crawlee/node_modules/@sapphire/shapeshift/dist/esm/index.mjs:2089:41)
      at _ObjectValidator.handlePassthroughStrategy (file:///home/jindrichbar/Desktop/apify/crawlee/node_modules/@sapphire/shapeshift/dist/esm/index.mjs:2170:25)
      at _ObjectValidator.handleStrategy (file:///home/jindrichbar/Desktop/apify/crawlee/node_modules/@sapphire/shapeshift/dist/esm/index.mjs:1982:47)
      at _ObjectValidator.handle (file:///home/jindrichbar/Desktop/apify/crawlee/node_modules/@sapphire/shapeshift/dist/esm/index.mjs:2081:17)
      at _ObjectValidator.parse (file:///home/jindrichbar/Desktop/apify/crawlee/node_modules/@sapphire/shapeshift/dist/esm/index.mjs:964:90)
      at RequestQueueClient.updateRequest (file:///home/jindrichbar/Desktop/apify/crawlee/packages/memory-storage/dist/resource-clients/request-queue.js:366:22)
      at RequestQueue.reclaimRequest (file:///home/jindrichbar/Desktop/apify/crawlee/packages/core/dist/storages/request_provider.js:386:35)
      at RequestQueue.reclaimRequest (file:///home/jindrichbar/Desktop/apify/crawlee/packages/core/dist/storages/request_queue_v2.js:219:33)
```

The crawler gets a double whammy, first from CheerioCrawler's `parseContent` (accesses `crawlingContext.contentType`):

https://github.com/apify/crawlee/blob/bca7d7af042eb5383e1464c69fd79166bc6698d7/packages/cheerio-crawler/src/internals/cheerio-crawler.ts#L195

and then Shapeshift's validation on `updateRequest` while handling the error above (this accesses `request.loadedUrl`):

https://github.com/apify/crawlee/blob/bca7d7af042eb5383e1464c69fd79166bc6698d7/packages/memory-storage/src/resource-clients/request-queue.ts#L514



This is caused by the addition of the validation `Proxy` on `CrawlingContext` and `Request` in `HttpCrawler` ([link](https://github.com/apify/crawlee/blob/bca7d7af042eb5383e1464c69fd79166bc6698d7/packages/http-crawler/src/internals/http-crawler.ts#L441) and [link](https://github.com/apify/crawlee/blob/bca7d7af042eb5383e1464c69fd79166bc6698d7/packages/http-crawler/src/internals/http-crawler.ts#L487))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ContextPipeline` changes break `skipNavigation` with `CheerioCrawler` #3304

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ContextPipeline changes break skipNavigation with CheerioCrawler #3304

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`ContextPipeline` changes break `skipNavigation` with `CheerioCrawler` #3304