Skip to content

[Core] AttributeError in cross_validation when using Polars DataFrame with horizons parameter #594

@TheFckReal

Description

@TheFckReal

What happened + What you expected to happen

cross_validation() raises an AttributeError when a Polars DataFrame is passed together with the horizons parameter.

Inside mlforecast/forecast.py:1109, the code calls .nunique() on what is a polars.Series at that point. This is a pandas API method - Polars exposes the same functionality as .n_unique(). The call path is only reached when horizons is explicitly provided, so the bug does not surface in the default recursive mode.

File ".../mlforecast/forecast.py", line 1109, in cross_validation
    n_series = valid[id_col].nunique()
               ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'nunique'. Did you mean: 'n_unique'?

Versions / Dependencies

python==3.12
mlforecast==1.0.31
polars==1.38.1

Reproduction script

import numpy as np
import polars as pl
from catboost import CatBoostRegressor
from mlforecast import MLForecast

SEED = 42
N_SERIES = 3
N_HOURS = 24 * 30
HORIZONS = [1, 6, 10, 24, 48]


def make_synthetic_df(n_series: int = N_SERIES, n_hours: int = N_HOURS) -> pl.DataFrame:
    rng = np.random.default_rng(SEED)
    base_ts = pl.datetime_range(
        start=pl.datetime(2025, 1, 1),
        end=pl.datetime(2025, 1, 1) + pl.duration(hours=n_hours - 1),
        interval="1h",
        eager=True,
    )

    frames = []
    for i in range(n_series):
        hours = np.arange(n_hours)
        signal = (
            50
            + 20 * np.sin(2 * np.pi * hours / 24)
            + 10 * np.sin(2 * np.pi * hours / (24 * 7))
            + rng.normal(scale=3, size=n_hours)
        )
        frames.append(
            pl.DataFrame(
                {
                    "unique_id": [f"series_{i}"] * n_hours,
                    "ds": base_ts,
                    "y": np.clip(signal, 0, None),
                }
            )
        )

    return pl.concat(frames).cast({"y": pl.Float64})


df = make_synthetic_df()

model = MLForecast(
    models=CatBoostRegressor(
        iterations=100,
        depth=4,
        verbose=0,
        allow_writing_files=False,
    ),
    freq="1h",
    lags=[1, 2, 3, 24, 48],
    date_features=["hour", "weekday"],
)

# Crashes with: AttributeError: 'Series' object has no attribute 'nunique'
cv_results = model.cross_validation(
    df=df,
    h=max(HORIZONS),
    n_windows=2,
    step_size=24,
    horizons=HORIZONS, # works if removed
    refit=False,
)

print(cv_results)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions