feat: Zod-based Configuration class for cleaner SDK extension#3387
feat: Zod-based Configuration class for cleaner SDK extension#3387
Conversation
Refactors the Configuration class to use Zod for declarative field definitions
with automatic environment variable mapping and type coercion.
Key changes:
- Field definitions are now declarative with `field()` helper
- Each field defines its Zod schema and env var mapping in one place
- Exported helpers for SDK extension: `field`, `coerceBoolean`, `logLevelSchema`
- Generic Configuration class supports inheritance via type parameters
- Priority order: constructor options > env vars > crawlee.json > defaults
- Proper TypeScript types for input (constructor) vs output (get())
This enables cleaner extension in Apify SDK without monkey patching:
```ts
const apifyConfigFields = {
...crawleeConfigFields,
token: field(z.string().optional(), { env: 'APIFY_TOKEN' }),
};
class Configuration extends CrawleeConfiguration<...> {
static override fields = apifyConfigFields;
}
```
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactors the Configuration class to use Zod-based field definitions,
extending Crawlee's new Configuration class cleanly without monkey patching.
Key changes:
- Uses `crawleeConfigFields` spread with Apify-specific overrides and additions
- Each field defines schema and env var aliases in one place
- Supports multiple env var aliases per field (e.g., ACTOR_ID, APIFY_ACTOR_ID)
- Removes all monkey patching of CoreConfiguration
- Adds zod as direct dependency
Example field definition:
```ts
actorId: field(z.string().optional(), {
env: ['ACTOR_ID', 'APIFY_ACTOR_ID'],
}),
```
Requires: apify/crawlee#3387
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds `extendField()` helper that extends an existing field with additional
env var mappings while preserving the base field's env vars. This avoids
repetition when extending fields in the SDK.
Example:
```ts
// No need to repeat CRAWLEE_DEFAULT_DATASET_ID
defaultDatasetId: extendField(crawleeConfigFields.defaultDatasetId, {
env: ['ACTOR_DEFAULT_DATASET_ID', 'APIFY_DEFAULT_DATASET_ID'],
}),
```
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Moves extendField from a standalone export to a static method on the Configuration class, providing better encapsulation while still being accessible for subclass field definitions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| * To reset a value, we can omit the `value` argument or pass `undefined` there. | ||
| */ | ||
| set(key: keyof ConfigurationOptions, value?: any): void { | ||
| set<K extends keyof TInput>(key: K, value?: TInput[K]): void { |
There was a problem hiding this comment.
I think we could just get rid of the set method. Internally, it's not used much and changing the configuration mid-flight is a heavy-duty footgun.
There was a problem hiding this comment.
so you would you set stuff that are not crawler options? crawlee.json?
There was a problem hiding this comment.
I'm not sure I understand, the Configuration class doesn't allow "unknown" options, right?
| */ | ||
| get<K extends keyof TOutput>(key: K, defaultValue: NonNullable<TOutput[K]>): NonNullable<TOutput[K]>; | ||
| get<K extends keyof TOutput>(key: K, defaultValue?: TOutput[K]): TOutput[K]; | ||
| get<K extends keyof TOutput>(key: K, defaultValue?: TOutput[K]): TOutput[K] { |
There was a problem hiding this comment.
Is there any way we could expose the config options by a direct property access? I.e., config.maxMemoryMbytes instead of config.get("maxMemoryMbytes")?
There was a problem hiding this comment.
There are ways, but i cant say i like them, since this is rather internal API, right?
- config class returning a proxy from constructor
- adding getters dynamically
Both require some type level magic (which is imo fine on its own).
There was a problem hiding this comment.
In plain crawlee, it is internal for sure. In Apify SDK, it's accessed by users frequently as a wrapper for the plethora of environment variables that the platform provides. It makes sense to me to make it as close to the POJO experience as possible... so, can I see the type level magic? 😁
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
47a7493 to
d4139b7
Compare
| : { | ||
| request: LoadedRequest<Context['request']>; | ||
| } & Omit<Context, 'request'>; | ||
| export type LoadedContext<Context extends RestrictedCrawlingContext> = |
There was a problem hiding this comment.
This looks like a whitespace-only change, why? Same thing in the tests...
| */ | ||
| get<K extends keyof TOutput>(key: K, defaultValue: NonNullable<TOutput[K]>): NonNullable<TOutput[K]>; | ||
| get<K extends keyof TOutput>(key: K, defaultValue?: TOutput[K]): TOutput[K]; | ||
| get<K extends keyof TOutput>(key: K, defaultValue?: TOutput[K]): TOutput[K] { |
There was a problem hiding this comment.
In plain crawlee, it is internal for sure. In Apify SDK, it's accessed by users frequently as a wrapper for the plethora of environment variables that the platform provides. It makes sense to me to make it as close to the POJO experience as possible... so, can I see the type level magic? 😁
| * To reset a value, we can omit the `value` argument or pass `undefined` there. | ||
| */ | ||
| set(key: keyof ConfigurationOptions, value?: any): void { | ||
| set<K extends keyof TInput>(key: K, value?: TInput[K]): void { |
There was a problem hiding this comment.
I'm not sure I understand, the Configuration class doesn't allow "unknown" options, right?
Summary
Refactors the Configuration class to use Zod for declarative field definitions with automatic environment variable mapping and type coercion.
field()helper - single source of truthfield,coerceBoolean,logLevelSchemaconstructor options > env vars > crawlee.json > defaultsMotivation
This aligns with how the Python Crawlee/SDK handles configuration and enables cleaner extension in Apify SDK without monkey patching:
Test plan
🤖 Generated with Claude Code