Skip to content

Comments

Option to fix preprocessing seed in finetuning#771

Merged
bejaeger merged 58 commits intomainfrom
ben/fixed-preprocessing-seed-in-finetuning
Feb 20, 2026
Merged

Option to fix preprocessing seed in finetuning#771
bejaeger merged 58 commits intomainfrom
ben/fixed-preprocessing-seed-in-finetuning

Conversation

@bejaeger
Copy link
Contributor

@bejaeger bejaeger commented Feb 2, 2026

Fixing the seed will, e.g., keep column permutations the same across batches. This is expected to improve results in finetuning.

Todo

  • Update consistency tests

@bejaeger bejaeger changed the base branch from ben/introduce-feature-modality-dict to main February 16, 2026 12:48
@bejaeger bejaeger marked this pull request as ready for review February 16, 2026 13:34
@bejaeger bejaeger requested a review from a team as a code owner February 16, 2026 13:34
@bejaeger bejaeger requested review from adrian-prior and Copilot and removed request for a team February 16, 2026 13:34
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for keeping preprocessing randomness fixed during fine-tuning so that stochastic preprocessing choices (e.g., column permutations) remain consistent across batches/epochs, and updates the codebase to pass/derive preprocessing random_state explicitly.

Changes:

  • Introduces use_fixed_preprocessing_seed on FinetunedTabPFNClassifier / FinetunedTabPFNRegressor and wires it into the fine-tuning data pipeline.
  • Refactors preprocessing randomness plumbing (rngrandom_state, plus separate data_shuffle_seed vs preprocessing_random_state) across classifier/regressor + preprocessing utilities.
  • Updates finetuning/inference tests and refreshes reference predictions to reflect the new deterministic behavior.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_inference.py Updates TabPFNEnsemblePreprocessor construction to use random_state=.
tests/test_finetuning_regressor.py Updates finetuning dataset chunk helper call to new seed/random_state parameters.
tests/test_finetuning_classifier.py Refactors helper usage for dataset chunk creation; improves patching approach; adds test covering fixed preprocessing seed behavior.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_v2_fit_preprocessors.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_v2.5_low_memory.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_v2.5_fit_with_cache.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_v2.5_fit_preprocessors.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_several_devices.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_v2_fit_preprocessors.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_v2.5_low_memory.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_v2.5_fit_with_cache.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_v2.5_fit_preprocessors.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_differentiable_input.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_5_estimators.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_iris_dataset_several_devices.json Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_iris_dataset.json Updates reference predictions after preprocessing randomness changes.
src/tabpfn/regressor.py Switches preprocessing RNG usage to explicit random_state derived via infer_random_state.
src/tabpfn/preprocessing/initialization.py Adds a new helper module for feature tagging + dtype sanitization + ordinal encoding setup.
src/tabpfn/preprocessing/ensemble.py Renames rng to random_state and standardizes seed derivation via infer_random_state.
src/tabpfn/preprocessing/init.py Minor import formatting cleanup.
src/tabpfn/finetuning/finetuned_regressor.py Exposes use_fixed_preprocessing_seed in the regressor fine-tuning wrapper API/docs.
src/tabpfn/finetuning/finetuned_classifier.py Exposes use_fixed_preprocessing_seed in the classifier fine-tuning wrapper API/docs; minor type signature tweak.
src/tabpfn/finetuning/finetuned_base.py Implements fixed-vs-varying preprocessing random state selection in the fine-tuning loop.
src/tabpfn/finetuning/data_util.py Splits data shuffling seed from preprocessing random state in dataset chunk creation.
src/tabpfn/classifier.py Switches preprocessing RNG usage to explicit random_state derived via infer_random_state.
src/tabpfn/base.py Changes model initialization helper to return only byte_size (no RNG), aligning RNG handling elsewhere.
examples/finetune_regressor.py Updates example docstring wording (VRAM statement).
examples/finetune_classifier.py Updates example docstring and estimator counts/random_state usage.
changelog/771.added.md Adds changelog entry for use_fixed_preprocessing_seed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@psinger-prior psinger-prior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please check / address the two comments if needed.

@bejaeger bejaeger enabled auto-merge (squash) February 20, 2026 13:47
@bejaeger bejaeger merged commit c22f073 into main Feb 20, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants