Integration with Upgini by c3p0upgini · Pull Request #251 · autogluon/autogluon-assistant

c3p0upgini · 2025-11-14T14:02:41Z

Hey there,
We're Upgini, a team developing automated data processing, label-supervised data retrieval and robust feature selection. We're currently investigating how data processing affects AI agent performance, and we did several tests with AutoGluon Assistant. We think library selection approach is a solid alternative to full code generation, and we also see that on tabular data AutoGluon can surpass most of current agentic approaches.

We propose to extend the tooling, in particular to add data processing tools to the agent's capabilities. We acknowledge that the current integration with Upgini is intentionally quite hard-coded. We chose the shortest path to validate the hypothesis that adding automated feature enrichment could improve model quality.

In such a design, the LLM would be able to select the appropriate preprocessing tool based on the task description, with Upgini being one of the available tools. This would also allow us to provide clear documentation describing the use cases where Upgini is beneficial and when it should be applied.

Another thing we noticed is that AutoGluon can overfit on large datasets. For example, on New york city taxi fare prediction Kaggle competition model fits much better on the 1 million sample, than on original 55 millions dataset. So we added support of sampling by setting environment variable MLZERO_SAMPLE_SIZE.

Even at this early stage, we are already seeing measurable improvements — for example, on several Kaggle competitions the integration yields about +1.2% improvement even without external data. If the MLZero team is interested in deeper Upgini support, we would be happy to proceed with a more architecturally correct and extensible integration.

Description

Added integration with Upgini library to automatically enrich tabular datasets during the preprocessing stage.
This allows the model to receive additional external features and select the most relevant ones before training.
Added optional sampling controlled by the environment variable MLZERO_SAMPLE_SIZE. AutoGluon tends to overfit on very large datasets, so sampling can sometimes lead to better metrics.

Major changes:

azure_openai_chat.py – fixed handling of o1 and o3 models that don’t support the temperature parameter (tested with the o3-mini model deployed in Azure).
bash_coder_prompt.py – fixed environment handling so that packages are installed into the same runtime environment (previously, installed packages were not visible when executing generated LLM Python scripts).
python_coder_prompt.py – added LLM instructions for feature enrichment using Upgini before model training.

How Has This Been Tested?

Unit tests (pytest tests/)
Integration tests - multiple runs of MLZero agent on various tabular datasets from MLE Benchmark:

Nomad Semiconductors
Tabular Series Playground
New York City Taxi Fare Prediction

Verified pipeline behavior with and without Upgini enrichment enabled.

Configuration Changes

Added config file for models deployed in Azure: azure.yaml

For the integration to work, you must define the following environment variable before running:

export UPGINI_API_KEY=<your_api_key>

You can obtain your API key after registering at https://profile.upgini.com/.

Type of Change

Bug fix
New feature

Related Work / Benchmark Results

For a detailed comparison of model performance with and without the Upgini integration, see the benchmark runs in the MLE Benchmark repository:

🔗 [Results PR]

This PR demonstrates the improvement obtained by enriching datasets with Upgini during preprocessing.

Add integration with Upgini + support Azure provider

c3p0upgini · 2025-11-19T14:27:23Z

@FANGAreNotGnu, kindly ask you to review this PR

FANGAreNotGnu · 2025-12-09T23:31:39Z

Thank you for this work on integrating Upgini with AutoGluon Assistant.

After careful consideration, we've decided not to merge this functionality. Our main concern is the additional commercial dependency: while we use paid LLM APIs for core functionality, adding another paid service for feature enrichment expands external dependencies beyond our intended scope and increases the maintenance burden.

As an alternative, you might consider maintaining the integration as a standalone repository.

We appreciate your contribution.

c3p0upgini added 2 commits November 13, 2025 15:38

Add integration with Upgini + support Azure provider

7ddf65a

Merge pull request #2 from upgini/upgini-integration

45b2c14

Add integration with Upgini + support Azure provider

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration with Upgini#251

Integration with Upgini#251
c3p0upgini wants to merge 2 commits intoautogluon:mainfrom
upgini:main

c3p0upgini commented Nov 14, 2025 •

edited

Loading

Uh oh!

c3p0upgini commented Nov 19, 2025

Uh oh!

FANGAreNotGnu commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

c3p0upgini commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Configuration Changes

Type of Change

Related Work / Benchmark Results

Uh oh!

c3p0upgini commented Nov 19, 2025

Uh oh!

FANGAreNotGnu commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

c3p0upgini commented Nov 14, 2025 •

edited

Loading