[WIP] Batched Predictions by klemens-floege · Pull Request #793 · PriorLabs/TabPFN

klemens-floege · 2026-02-18T18:48:16Z

No description provided.

chatgpt-codex-connector · 2026-02-18T18:48:22Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

gemini-code-assist

Code Review

The pull request introduces batched prediction functionality to the TabPFNClassifier and TabPFNRegressor classes, along with a new predict_in_batches utility function. This is a valuable addition for handling large datasets more efficiently and preventing out-of-memory errors. The changes also update the TabPFNOutOfMemoryError message to guide users towards the new batch_size_predict parameter. New tests have been added to ensure the batched predictions match the unbatched results across various output types. Overall, the changes are well-implemented and improve the usability of the library for large-scale inference.

gemini-code-assist · 2026-02-18T18:55:37Z

src/tabpfn/classifier.py

            The predicted class labels as a NumPy array.
        """
+        if batch_size_predict is not None:
+            return predict_in_batches(self.predict, X, batch_size_predict)


When predict_in_batches is called recursively within the predict method, it passes self.predict as the predict_fn. This can lead to infinite recursion if not handled carefully. While Python's default recursion limit might catch this, it's safer to pass self._predict_proba and then handle the argmax and inverse_transform outside the predict_in_batches call, or ensure that the predict_fn passed to predict_in_batches does not itself call predict_in_batches with the same batch_size_predict parameter.

Suggested change

return predict_in_batches(self.predict, X, batch_size_predict)

return predict_in_batches(

lambda chunk: np.argmax(self._predict_proba(chunk), axis=1),

X,

batch_size_predict,

concat_fn=lambda results: self.label_encoder_.inverse_transform(np.concatenate(results, axis=0)) if hasattr(self, "label_encoder_") and self.label_encoder_ is not None else np.concatenate(results, axis=0)

)

gemini-code-assist · 2026-02-18T18:55:37Z

src/tabpfn/classifier.py

            The predicted logits as a NumPy array. Shape (n_samples, n_classes).
        """
+        if batch_size_predict is not None:
+            return predict_in_batches(self.predict_logits, X, batch_size_predict)


Similar to the predict method, passing self.predict_logits recursively to predict_in_batches can lead to infinite recursion. It's better to pass the underlying _raw_predict function and then apply the necessary post-processing.

Suggested change

return predict_in_batches(self.predict_logits, X, batch_size_predict)

return predict_in_batches(

lambda chunk: self._raw_predict(chunk, return_logits=True).float().detach().cpu().numpy(),

X,

batch_size_predict

)

gemini-code-assist · 2026-02-18T18:55:38Z

src/tabpfn/classifier.py

+        if batch_size_predict is not None:
+            return predict_in_batches(
+                self.predict_raw_logits,
+                X,
+                batch_size_predict,
+                concat_fn=lambda results: np.concatenate(results, axis=1),
+            )


The predict_raw_logits method passes self.predict_raw_logits recursively to predict_in_batches. This can lead to infinite recursion. Instead, the _raw_predict method should be called directly within the lambda function.

return predict_in_batches( lambda chunk: self._raw_predict(chunk, return_logits=False, return_raw_logits=True).float().detach().cpu().numpy(), X, batch_size_predict, concat_fn=lambda results: np.concatenate(results, axis=1), )

gemini-code-assist · 2026-02-18T18:55:38Z

src/tabpfn/classifier.py

            Shape (n_samples, n_classes).
        """
+        if batch_size_predict is not None:
+            return predict_in_batches(self.predict_proba, X, batch_size_predict)


Passing self.predict_proba recursively to predict_in_batches can cause infinite recursion. It's more robust to call the internal _predict_proba method directly within the lambda function.

return predict_in_batches(lambda chunk: self._predict_proba(chunk), X, batch_size_predict)

gemini-code-assist · 2026-02-18T18:55:38Z

src/tabpfn/regressor.py

+        if batch_size_predict is not None:
+            return predict_in_batches(
+                lambda chunk: self.predict(
+                    chunk, output_type=output_type, quantiles=quantiles
+                ),
+                X,
+                batch_size_predict,
+                concat_fn=lambda results: _concatenate_regression_results(
+                    results, output_type
+                ),
+            )


The predict method recursively calls itself within predict_in_batches. This can lead to infinite recursion if not handled carefully. It's safer to pass a lambda that directly calls the internal logic for prediction (e.g., the part after the if batch_size_predict is not None: block) rather than calling self.predict again.

return predict_in_batches( lambda chunk: self._predict_internal(chunk, output_type=output_type, quantiles=quantiles), # Assuming _predict_internal encapsulates the core logic X, batch_size_predict, concat_fn=lambda results: _concatenate_regression_results( results, output_type ), )

gemini-code-assist · 2026-02-18T18:55:38Z

src/tabpfn/regressor.py

+            np.concatenate([r[q] for r in results], axis=0)
+            for q in range(len(results[0]))


In the _concatenate_regression_results function, when output_type == "quantiles", the code accesses r[q] where r is an element from results (which is RegressionResultType) and q is an integer. This implies r is expected to be a list or array of quantiles. However, RegressionResultType can also be np.ndarray, MainOutputDict, or FullOutputDict. This could lead to a TypeError if r is not indexable by an integer q (e.g., if it's a np.ndarray representing a single output type like 'mean'). It should explicitly check the type or ensure results always contains lists of quantiles when output_type is 'quantiles'.

return [ np.concatenate([typing.cast(list[np.ndarray], r)[q] for r in results], axis=0) for q in range(len(typing.cast(list[np.ndarray], results[0]))) ]

gemini-code-assist · 2026-02-18T18:55:38Z

src/tabpfn/regressor.py

+            mean=np.concatenate([r["mean"] for r in results], axis=0),
+            median=np.concatenate([r["median"] for r in results], axis=0),
+            mode=np.concatenate([r["mode"] for r in results], axis=0),
+            quantiles=[
+                np.concatenate([r["quantiles"][q] for r in results], axis=0)
+                for q in range(len(results[0]["quantiles"]))
+            ],


In the _concatenate_regression_results function, when output_type is 'main' or 'full', the code accesses r["mean"], r["median"], r["mode"], and r["quantiles"] from r which is an element of results. results is a list[RegressionResultType]. If RegressionResultType is np.ndarray (e.g., if the original predict call returned only 'mean'), then accessing r["mean"] would raise a TypeError. The function should ensure that results contains MainOutputDict or FullOutputDict when processing 'main' or 'full' output types, or handle the np.ndarray case gracefully.

main = MainOutputDict( mean=np.concatenate([typing.cast(MainOutputDict, r)["mean"] for r in results], axis=0), median=np.concatenate([typing.cast(MainOutputDict, r)["median"] for r in results], axis=0), mode=np.concatenate([typing.cast(MainOutputDict, r)["mode"] for r in results], axis=0), quantiles=[ np.concatenate([typing.cast(MainOutputDict, r)["quantiles"][q] for r in results], axis=0) for q in range(len(typing.cast(MainOutputDict, results[0])["quantiles"])) # Cast results[0] to MainOutputDict ], )

gemini-code-assist · 2026-02-18T18:55:38Z

src/tabpfn/regressor.py

+            return main
+        return FullOutputDict(
+            **main,
+            criterion=results[0]["criterion"],


Similar to the previous comment, r["logits"] assumes r is a FullOutputDict. If r is an np.ndarray or MainOutputDict, this will cause an error. Type casting r to FullOutputDict would make this access safe.

criterion=typing.cast(FullOutputDict, results[0])["criterion"], logits=torch.cat([typing.cast(FullOutputDict, r)["logits"] for r in results], dim=0),

gemini-code-assist · 2026-02-18T18:55:38Z

src/tabpfn/base.py

+def predict_in_batches(
+    predict_fn: typing.Callable,
+    X: XType,
+    batch_size: int,
+    concat_fn: typing.Callable | None = None,
+) -> typing.Any:


The predict_in_batches function is a good addition for handling large datasets. However, the type hint for X is XType, which is a generic Any. It would be more precise to use np.ndarray or torch.Tensor as X is indexed directly, which implies it's an array-like object. This improves type safety and readability.

Suggested change

def predict_in_batches(

predict_fn: typing.Callable,

X: XType,

batch_size: int,

concat_fn: typing.Callable | None = None,

) -> typing.Any:

def predict_in_batches(

predict_fn: typing.Callable,

X: np.ndarray, # More specific type hint

batch_size: int,

concat_fn: typing.Callable | None = None,

) -> typing.Any:

gemini-code-assist · 2026-02-18T18:55:38Z

src/tabpfn/errors.py

+            f"Solution: Use batch_size_predict to split test data"
+            f" into smaller batches:\n\n"
+            f"    model.{predict_method}("
+            f"X_test, batch_size_predict=100)"


The example code in the error message uses batch_size_predict=100. While this is a reasonable default, it might be beneficial to mention that the optimal batch_size_predict depends on hardware and the specific dataset, similar to how the previous message suggested batch_size = 1000 # depends on hardware. This would provide more comprehensive guidance to the user.

f

klemens-floege added 2 commits February 17, 2026 18:22

error and base funciton

ec04b5d

add to regressor + classifier

5d3e726

klemens-floege requested a review from a team as a code owner February 18, 2026 18:48

klemens-floege requested review from priorphil and removed request for a team February 18, 2026 18:48

klemens-floege removed the request for review from priorphil February 18, 2026 18:48

gemini-code-assist bot reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[WIP] Batched Predictions#793

[WIP] Batched Predictions#793
klemens-floege wants to merge 2 commits intomainfrom
klemens/batched-predictions

klemens-floege commented Feb 18, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		np.concatenate([r[q] for r in results], axis=0)
		for q in range(len(results[0]))

Comments

Conversation

klemens-floege commented Feb 18, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant