Enhance telemetry performance by safaricd · Pull Request #60 · PriorLabs/tabpfn_common_utils

safaricd · 2026-02-13T07:38:00Z

Change Description

Previous state

Telemetry configs were downloaded by always busting caches.
TTL for caching the telemetry config was set to only 5 minutes.
No timeout on downloading telemetry config from the bucket via the CDN.
GPU availability was determined by importing and re-initializing torch.
Dependency versions were determined by importing them.

Changes

Telemetry config download will timeout after 0.25 seconds and return defaults instead - always on.
Cache for downloading the config will never get busted.
NVIDIA GPUs detected using low-level nvidia-ml-py C interfaces.
We only import torch when absolutely necessary; as a third step and a fallback.
Assume torch is available in sys.modules because tabpfn already imports it.
Dependencies' versions are read from package metadata instead of importing them.
Fallback to already initialized torch from sys.modules, if available.

Results
The following table shows the performance difference with and without telemetry.

Phase	Diff Mean	% Overhead
fit	0.0026s	0.95%
predict	-0.0007s	-0.07%

chatgpt-codex-connector · 2026-02-13T07:38:05Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

gemini-code-assist · 2026-02-13T07:38:17Z

Summary of Changes

Hello @safaricd, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances telemetry performance by optimizing how configuration files are downloaded and cached, and by streamlining the detection of GPU types and retrieval of dependency versions. These changes aim to reduce overhead and improve the responsiveness of the application, particularly in scenarios involving GPU detection and package information lookup, without compromising the accuracy of the collected data.

Highlights

Telemetry Configuration Download: The telemetry configuration download process has been optimized by removing cache busting, increasing the cache TTL to 60 minutes, and adding a short timeout of 0.25 seconds to prevent blocking the main thread. The default telemetry state is now enabled.
GPU Detection Method: GPU detection has been improved for performance by prioritizing the use of the nvidia-ml-py library for NVIDIA GPUs, which is faster than importing and initializing torch. A fallback to torch is still available, but it now checks sys.modules first to avoid re-importing.
Dependency Version Retrieval: Dependency versions are now primarily retrieved using importlib.metadata.version from package metadata, which avoids the overhead of importing the entire package. A fallback to importing the module and checking __version__ is provided if metadata is not found.
New Dependency Added: The nvidia-ml-py library has been added as a new dependency to facilitate faster and more efficient NVIDIA GPU detection.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

pyproject.toml
- Added nvidia-ml-py to the project's dependencies.
src/tabpfn_common_utils/telemetry/core/config.py
- Removed datetime and timezone imports as they are no longer needed for cache busting.
- Increased the ttl_cache duration for download_config from 5 minutes to 60 minutes.
- Removed the timestamp parameter from the requests.get call to prevent cache busting.
- Changed the default enabled state in download_config from False to True.
- Added a timeout of 0.25 seconds to the requests.get call for telemetry configuration download.
src/tabpfn_common_utils/telemetry/core/events.py
- Imported importlib, importlib.metadata, PackageNotFoundError, and pynvml for improved package and GPU detection.
- Modified _get_sklearn_version to correctly use scikit-learn as the distribution name for version retrieval.
- Refactored _get_gpu_type to first attempt GPU detection using pynvml for NVIDIA GPUs, then fall back to a new _get_torch_gpu_type function.
- Introduced _get_torch_gpu_type to handle PyTorch-based GPU detection, checking sys.modules for an existing torch import before attempting a new import.
- Updated _get_package_version to primarily use importlib.metadata.version for version retrieval, with a fallback to importing the module if metadata is unavailable.
uv.lock
- Added nvidia-ml-py package details, including its version, source, and wheel information.
- Included nvidia-ml-py in the dependencies and requires-dist sections of the lock file.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

safaricd · 2026-02-13T07:39:39Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces several well-motivated performance enhancements to the telemetry system. The changes, such as optimizing dependency version retrieval, improving GPU detection with nvidia-ml-py, and refining the configuration download process, are clear and effective. My review includes a few suggestions to improve exception handling by making it more specific, which will enhance robustness and debuggability. I also noted a minor docstring inconsistency. Overall, these are excellent improvements.

src/tabpfn_common_utils/telemetry/core/config.py

src/tabpfn_common_utils/telemetry/core/events.py

gemini-code-assist

Code Review

This pull request significantly enhances the performance of the telemetry system. The changes are well-thought-out, focusing on reducing overhead by avoiding costly imports and using more efficient methods for data collection. Key improvements include:

Caching telemetry configuration for a longer duration and removing cache-busting to improve network performance.
Introducing a timeout for fetching the telemetry config to prevent blocking.
Using the lightweight nvidia-ml-py library for faster GPU detection, with a fallback to torch.
Leveraging importlib.metadata to get package versions without importing the packages themselves.

The code is cleaner and more robust. I have a few suggestions to further improve error handling and documentation consistency.

src/tabpfn_common_utils/telemetry/core/events.py

simo-prior · 2026-02-13T09:57:32Z

src/tabpfn_common_utils/telemetry/core/events.py

+            nvmlDeviceGetName(nvmlDeviceGetHandleByIndex(i)) for i in range(counts)
+        ]
+
+        # Because NVML runs very fast, we just return the device name


Still worth to cache as it will run on every event?

I was thinking of the same thing, however in some rare cases, GPUs might be attached or detached to a VM, so we'd have to cache this information on-disk with a TTL. Anyway, given that NVML runs within 20-30 milliseconds, not really worth it ATM.

simo-prior · 2026-02-13T10:02:44Z

src/tabpfn_common_utils/telemetry/core/events.py

+
+
+@lru_cache(maxsize=1)
+def _get_torch_gpu_type() -> Optional[str]:


I wonder whether we could get this information eagerly at import time instead of lazily at event creation. Maybe we could even get the info straight from tabpfn.

We might - an interesting area to explore in the future.

Further performance improvements

eace607

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

src/tabpfn_common_utils/telemetry/core/events.py Show resolved Hide resolved

src/tabpfn_common_utils/telemetry/core/events.py Show resolved Hide resolved

src/tabpfn_common_utils/telemetry/core/events.py Show resolved Hide resolved

Ensure proper nvml shutdown

b7d702a

safaricd force-pushed the PRI-217 branch from 2169a87 to b7d702a Compare February 13, 2026 07:45

nvml fix

ce2b18c

simo-prior reviewed Feb 13, 2026

View reviewed changes

simo-prior approved these changes Feb 13, 2026

View reviewed changes

safaricd merged commit a4d0d02 into main Feb 13, 2026
9 checks passed



		@lru_cache(maxsize=1)
		def _get_torch_gpu_type() -> Optional[str]:

Comments

Conversation

safaricd commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Description

Uh oh!

chatgpt-codex-connector bot commented Feb 13, 2026

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

safaricd commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simo-prior Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

safaricd Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

simo-prior Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

safaricd Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

safaricd commented Feb 13, 2026 •

edited

Loading