Skip to content

Comments

feat: Zod-based Configuration class for cleaner SDK extension#3387

Open
B4nan wants to merge 5 commits intov4from
feat/zod-configuration-v4
Open

feat: Zod-based Configuration class for cleaner SDK extension#3387
B4nan wants to merge 5 commits intov4from
feat/zod-configuration-v4

Conversation

@B4nan
Copy link
Member

@B4nan B4nan commented Feb 4, 2026

Summary

Refactors the Configuration class to use Zod for declarative field definitions with automatic environment variable mapping and type coercion.

  • Field definitions are now declarative with field() helper - single source of truth
  • Each field defines its Zod schema and env var mapping in one place
  • Exported helpers for SDK extension: field, coerceBoolean, logLevelSchema
  • Generic Configuration class supports inheritance via type parameters
  • Priority order: constructor options > env vars > crawlee.json > defaults
  • Proper TypeScript types for input (constructor) vs output (get())

Motivation

This aligns with how the Python Crawlee/SDK handles configuration and enables cleaner extension in Apify SDK without monkey patching:

// Old SDK approach (required monkey patching)
CoreConfiguration.ENV_MAP = Configuration.ENV_MAP;
CoreConfiguration.BOOLEAN_VARS = Configuration.BOOLEAN_VARS;
// ...

// New approach - just extend fields and class
const apifyConfigFields = {
    ...crawleeConfigFields,
    token: field(z.string().optional(), { env: 'APIFY_TOKEN' }),
    actorId: field(z.string().optional(), { env: ['ACTOR_ID', 'APIFY_ACTOR_ID'] }),
};

class Configuration extends CrawleeConfiguration<ApifyConfigFields, ...> {
    static override fields = apifyConfigFields;
}

Test plan

  • Type checking passes
  • Full test suite (needs v4 CI)
  • Integration with Apify SDK

🤖 Generated with Claude Code

Refactors the Configuration class to use Zod for declarative field definitions
with automatic environment variable mapping and type coercion.

Key changes:
- Field definitions are now declarative with `field()` helper
- Each field defines its Zod schema and env var mapping in one place
- Exported helpers for SDK extension: `field`, `coerceBoolean`, `logLevelSchema`
- Generic Configuration class supports inheritance via type parameters
- Priority order: constructor options > env vars > crawlee.json > defaults
- Proper TypeScript types for input (constructor) vs output (get())

This enables cleaner extension in Apify SDK without monkey patching:
```ts
const apifyConfigFields = {
    ...crawleeConfigFields,
    token: field(z.string().optional(), { env: 'APIFY_TOKEN' }),
};
class Configuration extends CrawleeConfiguration<...> {
    static override fields = apifyConfigFields;
}
```

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
B4nan added a commit to apify/apify-sdk-js that referenced this pull request Feb 4, 2026
Refactors the Configuration class to use Zod-based field definitions,
extending Crawlee's new Configuration class cleanly without monkey patching.

Key changes:
- Uses `crawleeConfigFields` spread with Apify-specific overrides and additions
- Each field defines schema and env var aliases in one place
- Supports multiple env var aliases per field (e.g., ACTOR_ID, APIFY_ACTOR_ID)
- Removes all monkey patching of CoreConfiguration
- Adds zod as direct dependency

Example field definition:
```ts
actorId: field(z.string().optional(), {
    env: ['ACTOR_ID', 'APIFY_ACTOR_ID'],
}),
```

Requires: apify/crawlee#3387

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
B4nan and others added 3 commits February 6, 2026 10:10
Adds `extendField()` helper that extends an existing field with additional
env var mappings while preserving the base field's env vars. This avoids
repetition when extending fields in the SDK.

Example:
```ts
// No need to repeat CRAWLEE_DEFAULT_DATASET_ID
defaultDatasetId: extendField(crawleeConfigFields.defaultDatasetId, {
    env: ['ACTOR_DEFAULT_DATASET_ID', 'APIFY_DEFAULT_DATASET_ID'],
}),
```

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Moves extendField from a standalone export to a static method on the
Configuration class, providing better encapsulation while still being
accessible for subclass field definitions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* To reset a value, we can omit the `value` argument or pass `undefined` there.
*/
set(key: keyof ConfigurationOptions, value?: any): void {
set<K extends keyof TInput>(key: K, value?: TInput[K]): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just get rid of the set method. Internally, it's not used much and changing the configuration mid-flight is a heavy-duty footgun.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you would you set stuff that are not crawler options? crawlee.json?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand, the Configuration class doesn't allow "unknown" options, right?

*/
get<K extends keyof TOutput>(key: K, defaultValue: NonNullable<TOutput[K]>): NonNullable<TOutput[K]>;
get<K extends keyof TOutput>(key: K, defaultValue?: TOutput[K]): TOutput[K];
get<K extends keyof TOutput>(key: K, defaultValue?: TOutput[K]): TOutput[K] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we could expose the config options by a direct property access? I.e., config.maxMemoryMbytes instead of config.get("maxMemoryMbytes")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are ways, but i cant say i like them, since this is rather internal API, right?

  • config class returning a proxy from constructor
  • adding getters dynamically

Both require some type level magic (which is imo fine on its own).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In plain crawlee, it is internal for sure. In Apify SDK, it's accessed by users frequently as a wrapper for the plethora of environment variables that the platform provides. It makes sense to me to make it as close to the POJO experience as possible... so, can I see the type level magic? 😁

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@B4nan B4nan force-pushed the feat/zod-configuration-v4 branch from 47a7493 to d4139b7 Compare February 6, 2026 14:03
: {
request: LoadedRequest<Context['request']>;
} & Omit<Context, 'request'>;
export type LoadedContext<Context extends RestrictedCrawlingContext> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a whitespace-only change, why? Same thing in the tests...

*/
get<K extends keyof TOutput>(key: K, defaultValue: NonNullable<TOutput[K]>): NonNullable<TOutput[K]>;
get<K extends keyof TOutput>(key: K, defaultValue?: TOutput[K]): TOutput[K];
get<K extends keyof TOutput>(key: K, defaultValue?: TOutput[K]): TOutput[K] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In plain crawlee, it is internal for sure. In Apify SDK, it's accessed by users frequently as a wrapper for the plethora of environment variables that the platform provides. It makes sense to me to make it as close to the POJO experience as possible... so, can I see the type level magic? 😁

* To reset a value, we can omit the `value` argument or pass `undefined` there.
*/
set(key: keyof ConfigurationOptions, value?: any): void {
set<K extends keyof TInput>(key: K, value?: TInput[K]): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand, the Configuration class doesn't allow "unknown" options, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants