Option to fix preprocessing seed in finetuning by bejaeger · Pull Request #771 · PriorLabs/TabPFN

bejaeger · 2026-02-02T16:02:48Z

Fixing the seed will, e.g., keep column permutations the same across batches. This is expected to improve results in finetuning.

Todo

Update consistency tests

…eature-modality-dict

src/tabpfn/classifier.py

…om:PriorLabs/TabPFN into ben/fixed-preprocessing-seed-in-finetuning

chatgpt-codex-connector · 2026-02-16T13:34:25Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copilot

Pull request overview

Adds support for keeping preprocessing randomness fixed during fine-tuning so that stochastic preprocessing choices (e.g., column permutations) remain consistent across batches/epochs, and updates the codebase to pass/derive preprocessing random_state explicitly.

Changes:

Introduces use_fixed_preprocessing_seed on FinetunedTabPFNClassifier / FinetunedTabPFNRegressor and wires it into the fine-tuning data pipeline.
Refactors preprocessing randomness plumbing (rng → random_state, plus separate data_shuffle_seed vs preprocessing_random_state) across classifier/regressor + preprocessing utilities.
Updates finetuning/inference tests and refreshes reference predictions to reflect the new deterministic behavior.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/test_inference.py	Updates `TabPFNEnsemblePreprocessor` construction to use `random_state=`.
tests/test_finetuning_regressor.py	Updates finetuning dataset chunk helper call to new seed/random_state parameters.
tests/test_finetuning_classifier.py	Refactors helper usage for dataset chunk creation; improves patching approach; adds test covering fixed preprocessing seed behavior.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_v2_fit_preprocessors.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_v2.5_low_memory.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_v2.5_fit_with_cache.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_v2.5_fit_preprocessors.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/regressor_tiny_dataset_several_devices.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_v2_fit_preprocessors.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_v2.5_low_memory.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_v2.5_fit_with_cache.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_v2.5_fit_preprocessors.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_differentiable_input.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_tiny_dataset_5_estimators.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_iris_dataset_several_devices.json	Updates reference predictions after preprocessing randomness changes.
tests/reference_predictions/darwin_arm64/classifier_iris_dataset.json	Updates reference predictions after preprocessing randomness changes.
src/tabpfn/regressor.py	Switches preprocessing RNG usage to explicit `random_state` derived via `infer_random_state`.
src/tabpfn/preprocessing/initialization.py	Adds a new helper module for feature tagging + dtype sanitization + ordinal encoding setup.
src/tabpfn/preprocessing/ensemble.py	Renames `rng` to `random_state` and standardizes seed derivation via `infer_random_state`.
src/tabpfn/preprocessing/init.py	Minor import formatting cleanup.
src/tabpfn/finetuning/finetuned_regressor.py	Exposes `use_fixed_preprocessing_seed` in the regressor fine-tuning wrapper API/docs.
src/tabpfn/finetuning/finetuned_classifier.py	Exposes `use_fixed_preprocessing_seed` in the classifier fine-tuning wrapper API/docs; minor type signature tweak.
src/tabpfn/finetuning/finetuned_base.py	Implements fixed-vs-varying preprocessing random state selection in the fine-tuning loop.
src/tabpfn/finetuning/data_util.py	Splits data shuffling seed from preprocessing random state in dataset chunk creation.
src/tabpfn/classifier.py	Switches preprocessing RNG usage to explicit `random_state` derived via `infer_random_state`.
src/tabpfn/base.py	Changes model initialization helper to return only `byte_size` (no RNG), aligning RNG handling elsewhere.
examples/finetune_regressor.py	Updates example docstring wording (VRAM statement).
examples/finetune_classifier.py	Updates example docstring and estimator counts/random_state usage.
changelog/771.added.md	Adds changelog entry for `use_fixed_preprocessing_seed`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_finetuning_classifier.py

examples/finetune_classifier.py

examples/finetune_regressor.py

tests/test_finetuning_classifier.py

src/tabpfn/finetuning/data_util.py

src/tabpfn/finetuning/finetuned_base.py

psinger-prior

LGTM! Please check / address the two comments if needed.

src/tabpfn/finetuning/finetuned_base.py

src/tabpfn/preprocessing/initialization.py

bejaeger added 30 commits January 29, 2026 10:31

tweak

06af2a0

add reference predictions

d248833

use np.load

4f71386

Merge branch 'main' into ben/add-pipeline-consistency-test

fb647fa

200 -> 100 samples

39e632e

100 -> 50 samples

a6044ab

Introduce feature modalities, add TabPFNLabelEncoder

f3e3ee3

use columnmetadata

ec14a6f

update pipeline and add pipeline consistency tests

c579f38

clean up consistency tests

00c481e

update consistency tests

a336463

update references

fd44cb3

Merge branch 'ben/add-pipeline-consistency-test' into ben/introduce-f…

f0a81c9

…eature-modality-dict

skip test when not backwards compatible

a2ec121

Merge branch 'main' into ben/introduce-feature-modality-dict

d55bf8a

cleanup

36d3d2c

more cleanup

59af1f2

rename ensemble classes

07c2dcd

rename file

ae216ba

rename and cleanup

3a5816e

feature modalities -> feature metadata

f845980

feature metadata -> feature schema

280a1de

cleanup

62114bd

Merge branch 'main' into ben/introduce-feature-modality-dict

ad853c1

remove old test file

57e5eef

add back sklearn compatible error

20687cf

fix fit_transform in kid transform

712f1be

have kdi test for fit and fit_transform

1b1b81e

fix

08be623

cleanup and add changelog

f55689d

bejaeger added 5 commits February 5, 2026 10:55

improve finetuning tests

c04269c

fix seed to fix test

e074269

cleanup

7bf86b5

improve seed per epoch

1a0f35b

add changelog

977f2c1

psinger-prior reviewed Feb 10, 2026

View reviewed changes

src/tabpfn/classifier.py Outdated Show resolved Hide resolved

Merge branch 'main' into ben/fixed-preprocessing-seed-in-finetuning

22f5b61

bejaeger changed the base branch from ben/introduce-feature-modality-dict to main February 16, 2026 12:48

bejaeger added 4 commits February 16, 2026 14:00

Merge branch 'ben/fixed-preprocessing-seed-in-finetuning' of github.c…

7537f0f

…om:PriorLabs/TabPFN into ben/fixed-preprocessing-seed-in-finetuning

revision

0246dfa

update ref predictions due to changes in random seed handling

63004f4

update ref predictions with correct tabpfn

94af691

bejaeger marked this pull request as ready for review February 16, 2026 13:34

bejaeger requested a review from a team as a code owner February 16, 2026 13:34

bejaeger requested review from adrian-prior and Copilot and removed request for a team February 16, 2026 13:34

Copilot started reviewing on behalf of bejaeger February 16, 2026 13:34 View session

bejaeger requested review from psinger-prior and removed request for adrian-prior February 16, 2026 13:37

Copilot AI reviewed Feb 16, 2026

View reviewed changes

bejaeger added 4 commits February 16, 2026 14:46

revision

8dfd165

update ref predictions

53c1bca

Merge branch 'main' into ben/fixed-preprocessing-seed-in-finetuning

7f53396

add note on changed outcomes

943b4ed

psinger-prior approved these changes Feb 19, 2026

View reviewed changes

src/tabpfn/finetuning/finetuned_base.py Outdated Show resolved Hide resolved

src/tabpfn/preprocessing/initialization.py Outdated Show resolved Hide resolved

remove deprecated file

b354756

bejaeger enabled auto-merge (squash) February 20, 2026 13:47

bejaeger merged commit c22f073 into main Feb 20, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Option to fix preprocessing seed in finetuning#771

Option to fix preprocessing seed in finetuning#771
bejaeger merged 58 commits intomainfrom
ben/fixed-preprocessing-seed-in-finetuning

bejaeger commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

psinger-prior left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

bejaeger commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

psinger-prior left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bejaeger commented Feb 2, 2026 •

edited

Loading