Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
2ba06d1
update accelerate version range (#4596)
cheungdaven Nov 1, 2024
e549dc9
[CI-Bench] Add AMLB changes (#4603)
prateekdesai04 Nov 1, 2024
8b321dd
[CI] Schedule benchmark everday (#4598)
prateekdesai04 Nov 1, 2024
1bdfdf8
[CI Benchmark] Add dependencies for AG Bench 0.4.4 (#4607)
prateekdesai04 Nov 4, 2024
5e9ecc2
Bound nltk version to avoid verbose logging issue (#4604)
tonyhoo Nov 4, 2024
924f783
Fix specifying None in upload_file prefix (#4609)
Innixma Nov 4, 2024
33e1a4a
Add TODOs/FIXMEs (#4611)
Innixma Nov 4, 2024
dfb0472
Upgrade TIMM (#4580)
prateekdesai04 Nov 5, 2024
e207c63
Add AdamW support to NN_TORCH (#4610)
Innixma Nov 5, 2024
2457743
[Docker - v1.2] Update all images (#4614)
prateekdesai04 Nov 5, 2024
eed21dd
Key dependency updates in _setup_utils.py for v1.2 release (#4612)
tonyhoo Nov 6, 2024
d010514
[Doc] Fixed broken links in the tutorial (#4621)
tonyhoo Nov 6, 2024
4123813
[AutoMM] Add coco_root for better support for custom dataset in COCO …
FANGAreNotGnu Nov 6, 2024
5340a2e
Refactor TabularDataset (#4613)
Innixma Nov 7, 2024
98ca74f
[AutoMM] Add COCO Format Saving Support and Update Object Detection I…
FANGAreNotGnu Nov 7, 2024
ed3d4e1
Configurable Number of Checkpoints to Keep per HPO Trial (#4615)
FANGAreNotGnu Nov 8, 2024
0f13d9e
[AutoMM] Skip MMDet Config Files While Checking with bandit (#4630)
FANGAreNotGnu Nov 9, 2024
d66c3dc
[timeseries] Ensure that TimeSeriesDataFrame index is sorted before t…
shchur Nov 9, 2024
05cacd7
[timeseries] Add CovariatesRegressor for forecasting models (#4566)
shchur Nov 9, 2024
ad4d082
[timeseries] Fix inplace data modification by GluonTS models (#4633)
shchur Nov 9, 2024
0013c6c
Add `compute_metric` (#4631)
Innixma Nov 9, 2024
de1bf84
Fix Torch accidentally being imported immediately (#4635)
Innixma Nov 10, 2024
0d1d2ad
[Benchmark] Add PyArrow required by clean scripts (#4626)
prateekdesai04 Nov 10, 2024
999c9c0
[timeseries] Update to GluonTS v0.16.0 (#4628)
shchur Nov 11, 2024
7c874fe
[tabular] Set calibrate_decision_threshold="auto" (#4632)
Innixma Nov 11, 2024
0e4bfe1
[timeseries] add Chronos Bolt (#4625)
canerturkmen Nov 11, 2024
4e98196
Refactor Metrics for Each Problem Type (#4616)
FANGAreNotGnu Nov 12, 2024
f293834
[Tutorial] Fix Torch Version and Colab Installation for Object Detect…
FANGAreNotGnu Nov 12, 2024
0d29d19
[timeseries] Add weighted cumulative error forecasting metric (#4594)
shchur Nov 12, 2024
73bbb2a
[AutoMM] Fix Logloss Bug and Refine Compute Score Logics (#4629)
FANGAreNotGnu Nov 12, 2024
2eee958
[AutoMM] Fix Index Typo in Tutorial (#4642)
FANGAreNotGnu Nov 13, 2024
2d95e28
[timeseries] Add fine-tuning support for Chronos models (#4608)
abdulfatir Nov 13, 2024
41a3f3b
[timeseries] refactor GluonTS default parameter handling, update TiDE…
canerturkmen Nov 14, 2024
f151e6a
[timeseries] Add method to convert a TimeSeriesDataFrame to a regular…
shchur Nov 14, 2024
49f31da
[timeseries] Fix `fused=True` failure in Chronos fine-tuning on CPU (…
abdulfatir Nov 14, 2024
7b170e2
[AutoMM] Fix Proba Metrics for Multiclass (#4643)
FANGAreNotGnu Nov 15, 2024
cc636b9
[timeseries] Estimate the regressor prediction time (#4641)
shchur Nov 15, 2024
d702088
[timeseries] Move covariate scaling logic into a separate class (#4634)
shchur Nov 16, 2024
f57beb2
[timeseries] Fix `torch.Tensor.mean` use with incorrect kwargs (#4647)
abdulfatir Nov 17, 2024
2672fe0
[timeseries] Fix incorrect scale of the quantile forecast for Recursi…
shchur Nov 18, 2024
3ccbe46
[timeseries] prune timeseries unit and smoke tests (#4650)
canerturkmen Nov 18, 2024
5ecdf78
upgrade numpy to 2.0
suzhoum Oct 11, 2024
7936b27
fix setuptools warning
suzhoum Oct 11, 2024
651b8f3
upgrade scipy
suzhoum Oct 11, 2024
c524350
replace np.NINF
suzhoum Oct 11, 2024
39f1ace
fix feature tests
suzhoum Nov 5, 2024
7f74686
upgrade
suzhoum Nov 5, 2024
dd29a8b
cast to float64 to fix float32 issue
suzhoum Nov 5, 2024
21f9744
use numpy native NDArray
suzhoum Nov 6, 2024
8c3e656
cap numpy to 1.x.x for catboost issue
suzhoum Nov 6, 2024
f9e6035
relax catboost req on MacOS
suzhoum Nov 7, 2024
4a4b07a
update NDArray typehint
suzhoum Nov 7, 2024
dd8a8a9
update gluonts range
suzhoum Nov 11, 2024
93342bd
remove setuptool
suzhoum Nov 18, 2024
bfa3805
conditional test numpy 2.x and 1.x
suzhoum Nov 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflow_scripts/env_setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,11 @@ function setup_build_contrib_env {

function setup_benchmark_env {
pip install -U autogluon.bench
pip install pyarrow # TODO: Remove once AG-Bench v0.4.4 is released
pip install pyarrow
git clone https://github.com/autogluon/autogluon-dashboard.git
pip install -e ./autogluon-dashboard
pip install yq
pip install s3fs
}

function setup_hf_model_mirror {
Expand Down
2 changes: 1 addition & 1 deletion .github/workflow_scripts/lint_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ function lint_check_all {
# lint_check tabular
}

bandit -r multimodal/src -ll
bandit -r multimodal/src -ll --exclude "multimodal/src/autogluon/multimodal/configs/pretrain/*"
lint_check_all
ruff check timeseries/
2 changes: 2 additions & 0 deletions .github/workflow_scripts/setup_mmcv.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
function setup_mmcv {
# Install MMEngine from git with the fix for torch 2.5
python3 -m pip install "git+https://github.com/open-mmlab/mmengine.git@2e0ab7a92220d2f0c725798047773495d589c548"
mim install "mmcv==2.1.0" --timeout 60
python3 -m pip install "mmdet==3.2.0"
}
2 changes: 1 addition & 1 deletion .github/workflows/benchmark_master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
branches:
- master
schedule:
- cron: '00 09 * * SUN' # UTC 9:00(2:00 PST Time) every Sunday
- cron: '00 02 * * *' # UTC 2:00 AM every day

env:
AG_BRANCH_NAME: master
Expand Down
4 changes: 2 additions & 2 deletions CI/batch/docker/Dockerfile.cpu
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.2.0-cpu-py310-ubuntu20.04-ec2
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.4.0-cpu-py311-ubuntu22.04-ec2

RUN apt-get update \
&& apt-get -y upgrade \
&& apt-get install -y --no-install-recommends \
pandoc \
python3.8-venv \
python3.11-venv \
graphviz \
graphviz-dev \
&& apt-get autoremove -y \
Expand Down
4 changes: 2 additions & 2 deletions CI/batch/docker/Dockerfile.gpu
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.2.0-gpu-py310-cu121-ubuntu20.04-ec2
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.4.0-gpu-py311-cu124-ubuntu22.04-ec2

ARG AWSCLI_VER=1.22.45

RUN apt-get update \
&& apt-get -y upgrade \
&& apt-get install -y --no-install-recommends \
pandoc \
python3.8-venv \
python3.11-venv \
graphviz \
graphviz-dev \
&& apt-get autoremove -y \
Expand Down
2 changes: 1 addition & 1 deletion CI/bench/generate_bench_config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ if [ $MODULE == "tabular" ] || [ $MODULE == "timeseries" ]; then
--amlb-benchmark $BENCHMARK \
--amlb-constraint $TIME_LIMIT \
--amlb-user-dir $(dirname "$0")/amlb_user_dir \
--git-uri-branch https://github.com/openml/automlbenchmark.git#stable
--git-uri-branch https://github.com/Innixma/automlbenchmark.git#autogluon_switch_to_uv
else
FRAMEWORK=AutoGluon_$PRESET
aws s3 cp --recursive s3://autogluon-ci-benchmark/configs/$MODULE/$USER_DIR_S3_PREFIX/latest/ $(dirname "$0")/custom_user_dir/
Expand Down
2 changes: 1 addition & 1 deletion CI/docker/Dockerfile.cpu-inference
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.2.0-cpu-py310-ubuntu20.04-sagemaker
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.4.0-cpu-py311-ubuntu22.04-sagemaker

RUN apt-get update \
&& apt-get -y upgrade \
Expand Down
2 changes: 1 addition & 1 deletion CI/docker/Dockerfile.cpu-training
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.2.0-cpu-py310-ubuntu20.04-sagemaker
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.4.0-cpu-py311-ubuntu22.04-sagemaker

RUN apt-get update \
&& apt-get -y upgrade \
Expand Down
2 changes: 1 addition & 1 deletion CI/docker/Dockerfile.gpu-inference
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.2.0-gpu-py310-cu118-ubuntu20.04-sagemaker
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.4.0-gpu-py311-cu124-ubuntu22.04-sagemaker

RUN apt-get update \
&& apt-get -y upgrade \
Expand Down
2 changes: 1 addition & 1 deletion CI/docker/Dockerfile.gpu-training
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.2.0-gpu-py310-cu121-ubuntu20.04-sagemaker
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.4.0-gpu-py311-cu124-ubuntu22.04-sagemaker

RUN apt-get update \
&& apt-get -y upgrade \
Expand Down
2 changes: 0 additions & 2 deletions common/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,12 @@
"tqdm", # version range defined in `core/_setup_utils.py`
# s3fs is removed due to doubling install time due to version range resolution
# "s3fs", # version range defined in `core/_setup_utils.py`
"setuptools",
]
if not ag.LITE_MODE
else {
"numpy", # version range defined in `core/_setup_utils.py`
"pandas", # version range defined in `core/_setup_utils.py`
"tqdm", # version range defined in `core/_setup_utils.py`
"setuptools",
}
)

Expand Down
1 change: 1 addition & 0 deletions common/src/autogluon/common/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .dataset import TabularDataset
from .features.feature_metadata import FeatureMetadata
from .utils.log_utils import _add_stream_handler
from .utils.log_utils import fix_logging_if_kaggle as __fix_logging_if_kaggle
Expand Down
36 changes: 36 additions & 0 deletions common/src/autogluon/common/dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import pandas as pd

from .loaders import load_pd

__all__ = ["TabularDataset"]


class TabularDataset:
"""
A dataset in tabular format (with rows = samples, columns = features/variables).
This class returns a :class:`pd.DataFrame` when initialized and all existing pandas methods can be applied to it.
For full list of methods/attributes, see pandas Dataframe documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

The purpose of this class is to provide an easy-to-use shorthand for loading a pandas DataFrame to use in AutoGluon.

Parameters
----------
data : str, :class:`pd.DataFrame`, :class:`np.ndarray`, Iterable, or dict
If str, path to data file (CSV or Parquet format).
If you already have your data in a :class:`pd.DataFrame`, you can specify it here. In this case, the same DataFrame will be returned with no changes.

Examples
--------
>>> import pandas as pd
>>> from autogluon.common import TabularDataset
>>> train_data = TabularDataset("https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv")
>>> train_data_pd = pd.read_csv("https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv")
>>> assert isinstance(train_data, pd.DataFrame) # True
>>> assert train_data.equals(train_data_pd) # True
>>> assert type(train_data) == type(train_data_pd) # True
"""

def __new__(cls, data, **kwargs) -> pd.DataFrame:
if isinstance(data, str):
data = load_pd.load(data)
return pd.DataFrame(data, **kwargs)
1 change: 0 additions & 1 deletion common/src/autogluon/common/loaders/load_pd.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@

# TODO: v1.0 consider renaming function so it isn't 'load'. Consider instead 'load_pd', or something more descriptive.
# TODO: Add full docstring
# TODO: Add full docstring for usage within TabularDataset
def load(
path,
delimiter=None,
Expand Down
2 changes: 1 addition & 1 deletion common/src/autogluon/common/utils/s3_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def upload_file(*, file_name: str, bucket: str, prefix: Optional[str] = None):
import boto3

object_name = os.path.basename(file_name)
if len(prefix) == 0:
if prefix is not None and len(prefix) == 0:
prefix = None
if prefix is not None:
object_name = prefix + "/" + object_name
Expand Down
15 changes: 15 additions & 0 deletions common/tests/unittests/test_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import pandas as pd

from autogluon.common import TabularDataset


def test_tabular_dataset():
data = {"col1": [1, 2, 3, 4], "col2": ["a", "b", "b", "c"]}

df_1 = pd.DataFrame(data)
df_2 = TabularDataset(data)

assert isinstance(df_1, pd.DataFrame)
assert df_1.equals(df_2)
assert type(df_1) == pd.DataFrame
assert type(df_1) == type(df_2)
3 changes: 2 additions & 1 deletion core/src/autogluon/core/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# noinspection PyUnresolvedReferences
from autogluon.common.dataset import TabularDataset
from autogluon.common.utils.log_utils import _add_stream_handler

from . import constants, metrics
from .dataset import TabularDataset
from .version import __version__

_add_stream_handler()
15 changes: 7 additions & 8 deletions core/src/autogluon/core/_setup_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,22 +17,21 @@
# Only put packages here that would otherwise appear multiple times across different module's setup.py files.
DEPENDENT_PACKAGES = {
"boto3": ">=1.10,<2", # <2 because unlikely to introduce breaking changes in minor releases. >=1.10 because 1.10 is 3 years old, no need to support older
"numpy": ">=1.21,<1.29", # "<{N+3}" upper cap, where N is the latest released minor version, assuming no warnings using N
"numpy": ">=1.25.0,<2.1.4", # "<{N+3}" upper cap, where N is the latest released minor version, assuming no warnings using N
"pandas": ">=2.0.0,<2.3.0", # "<{N+3}" upper cap
"scikit-learn": ">=1.4.0,<1.5.3", # capping to latest version
"scipy": ">=1.5.4,<1.13", # "<{N+2}" upper cap
"scipy": ">=1.5.4,<1.16", # "<{N+2}" upper cap
"matplotlib": ">=3.7.0,<3.11", # "<{N+2}" upper cap
"psutil": ">=5.7.3,<7.0.0", # Major version cap
"s3fs": ">=2023.1,<2025", # Yearly cap
"networkx": ">=3.0,<4", # Major version cap
"tqdm": ">=4.38,<5", # Major version cap
"Pillow": ">=10.0.1,<12", # Major version cap
"torch": ">=2.2,<2.5", # Major version cap, sync with common/src/autogluon/common/utils/try_import.py
"lightning": ">=2.2,<2.4", # Major version cap
"pytorch_lightning": ">=2.2,<2.4", # Major version cap, capping `lightning` does not cap `pytorch_lightning`!
"async_timeout": ">=4.0,<5", # Major version cap
"transformers[sentencepiece]": ">=4.38.0,<4.41.0",
"accelerate": ">=0.21.0,<0.22.0",
"torch": ">=2.2,<2.6", # Major version cap, sync with common/src/autogluon/common/utils/try_import.py
"lightning": ">=2.2,<2.6", # Major version cap
"async_timeout": ">=4.0,<6", # Major version cap
"transformers[sentencepiece]": ">=4.38.0,<5",
"accelerate": ">=0.32.0,<1.0",
}
if LITE_MODE:
DEPENDENT_PACKAGES = {package: version for package, version in DEPENDENT_PACKAGES.items() if package not in ["psutil", "Pillow", "timm"]}
Expand Down
54 changes: 0 additions & 54 deletions core/src/autogluon/core/dataset.py

This file was deleted.

32 changes: 19 additions & 13 deletions core/src/autogluon/core/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from ..constants import BINARY, MULTICLASS, QUANTILE, REGRESSION, SOFTCLASS
from . import classification_metrics, quantile_metrics
from .classification_metrics import confusion_matrix
from .score_func import compute_metric


class Scorer(object, metaclass=ABCMeta):
Expand Down Expand Up @@ -180,26 +181,31 @@ def __repr__(self) -> str:
@property
@abstractmethod
def needs_pred(self) -> bool:
"""If True, metric requires predictions rather than prediction probabilities"""
raise NotImplementedError

@property
@abstractmethod
def needs_proba(self) -> bool:
"""If True, metric requires prediction probabilities rather than predictions"""
raise NotImplementedError

@property
@abstractmethod
def needs_class(self) -> bool:
"""If True, metric requires class label predictions rather than prediction probabilities"""
raise NotImplementedError

@property
@abstractmethod
def needs_threshold(self) -> bool:
"""If True, metric requires prediction probabilities rather than predictions"""
raise NotImplementedError

@property
@abstractmethod
def needs_quantile(self) -> bool:
"""If True, metric requires quantile predictions rather than predictions or prediction probabilities"""
raise NotImplementedError

score = __call__
Expand Down Expand Up @@ -685,32 +691,32 @@ def customized_roc_auc(y_true, y_pred, **kwargs):
_add_scorer_to_metric_dict(metric_dict=BINARY_METRICS, scorer=scorer)


for name, metric in [("precision", sklearn.metrics.precision_score), ("recall", sklearn.metrics.recall_score), ("f1", sklearn.metrics.f1_score)]:
globals()[name] = make_scorer(name, metric, needs_class=True)
_add_scorer_to_metric_dict(metric_dict=BINARY_METRICS, scorer=globals()[name])
for _name, _metric in [("precision", sklearn.metrics.precision_score), ("recall", sklearn.metrics.recall_score), ("f1", sklearn.metrics.f1_score)]:
globals()[_name] = make_scorer(_name, _metric, needs_class=True)
_add_scorer_to_metric_dict(metric_dict=BINARY_METRICS, scorer=globals()[_name])
for average in ["macro", "micro", "weighted"]:
qualified_name = "{0}_{1}".format(name, average)
globals()[qualified_name] = make_scorer(qualified_name, partial(metric, pos_label=None, average=average), needs_class=True)
qualified_name = "{0}_{1}".format(_name, average)
globals()[qualified_name] = make_scorer(qualified_name, partial(_metric, pos_label=None, average=average), needs_class=True)
_add_scorer_to_metric_dict(metric_dict=BINARY_METRICS, scorer=globals()[qualified_name])
_add_scorer_to_metric_dict(metric_dict=MULTICLASS_METRICS, scorer=globals()[qualified_name])


for name, metric, kwargs in [
for _name, _metric, _kwargs in [
("roc_auc_ovo", customized_roc_auc, dict(multi_class="ovo")),
("roc_auc_ovr", customized_roc_auc, dict(multi_class="ovr")),
]:
scorer_kwargs = dict(greater_is_better=True, needs_proba=True, needs_threshold=False)
globals()[name] = make_scorer(name, partial(metric, average="macro", **kwargs), **scorer_kwargs)
macro_name = "{0}_{1}".format(name, "macro")
globals()[name].add_alias(macro_name)
_add_scorer_to_metric_dict(metric_dict=MULTICLASS_METRICS, scorer=globals()[name])
if name == "roc_auc_ovo":
globals()[_name] = make_scorer(_name, partial(_metric, average="macro", **_kwargs), **scorer_kwargs)
macro_name = "{0}_{1}".format(_name, "macro")
globals()[_name].add_alias(macro_name)
_add_scorer_to_metric_dict(metric_dict=MULTICLASS_METRICS, scorer=globals()[_name])
if _name == "roc_auc_ovo":
averages = ["weighted"]
else:
averages = ["micro", "weighted"]
for average in averages:
qualified_name = "{0}_{1}".format(name, average)
globals()[qualified_name] = make_scorer(qualified_name, partial(metric, average=average, **kwargs), **scorer_kwargs)
qualified_name = "{0}_{1}".format(_name, average)
globals()[qualified_name] = make_scorer(qualified_name, partial(_metric, average=average, **_kwargs), **scorer_kwargs)
_add_scorer_to_metric_dict(metric_dict=MULTICLASS_METRICS, scorer=globals()[qualified_name])


Expand Down
Loading