[FEAT] Add Horizon for fitted values by janrth · Pull Request #586 · Nixtla/mlforecast

janrth · 2026-03-07T22:04:30Z

Allows users to set horizon for fitted values with param h. Standard is still h=1, but can be changed:

fitted_h1 = fcst.forecast_fitted_values(h=1)
fitted_h12 = fcst.forecast_fitted_values(h=12)

Works as expected.

Solves #346

Checklist:

This PR has a meaningful title and a clear description.
The tests pass.
All linting tasks pass.
The notebooks are clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 914d2cef46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

codspeed-hq · 2026-03-07T22:11:38Z

Merging this PR will not alter performance

✅ 12 untouched benchmarks

_{Comparing janrth:feature/multi-step_training_prediction (ea34e6c) with main (ed97ad0)}

janrth · 2026-03-07T22:22:11Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33b2b1c8cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nasaul · 2026-03-09T19:48:28Z

Thanks for the contribution — this addresses a long-standing gap (issue #346) and the direct model path (max_horizon) is clean and correct. A few things worth addressing before merging:

Major

1. Quadratic complexity in `_compute_recursive_fitted_values_on_demand`

The loop calls self.predict(new_df=hist[:t]) for each valid origin per series. Each call creates a new TimeSeries, calls _fit() on it (O(t) work), then runs an h-step rollout. Summed across T origins this is O(T²) — severely slow for real datasets. The warning says "can be slow" but understates this significantly.

Suggested fix — vectorized batch rollout (Option 1):

Since global/group transforms are already rejected, lag updates are simple index arithmetic. The idea is to preprocess once and do the h-step rollout in numpy across all origins simultaneously:

1. Preprocess full training data once → feature matrix X of shape (T, n_features)
   Keep actual targets y of shape (T,)

2. For step s = 1..h (the autoregressive rollout):
   a. Batch predict: ŷ_s = model.predict(X)   # O(T), one model call per step total
   b. Update X for next step:
      For each lag k:
        - If k < s:  replace lag_k column with ŷ_{s-k}  (use predicted value)
        - If k >= s: lag_k stays as actual y             (already in X)

3. ŷ_h[t] is the h-step-ahead fitted value for origin t

Complexity: O(h · T · n_lags) — linear in T, and only h model calls total (not T×h). This aligns with how _compute_fitted_values already works (batch over a feature matrix). The update in step 2b can be a small _update_lag_features(X, preds, step) helper — for lags-only it is ~10 lines of numpy index manipulation.

2. Fragile `self.ts` state mutation in the loop

original_ts = self.ts
for uid, group in train_pd.groupby(...):
    for target_idx in ...:
        try:
            preds = self.predict(h=h, new_df=hist, X_df=X_df)
        finally:
            self.ts = original_ts

self.predict(new_df=...) mutates self.ts as a side effect, requiring manual restoration via finally. The finally restores the original reference, but if self.ts was mutated in-place (not replaced) before the exception, the restored object may be in a dirty state. The batch rollout approach in point 1 would eliminate this pattern entirely.

3. Inconsistent `h` column presence

For recursive h=1 there is no h column. For recursive h>1 and direct models there is. The test even asserts this explicitly:

assert "h" not in fitted_h1.columns
assert "h" in fitted_h3.columns

This makes the return type unpredictable for users and complicates downstream code. The h column should always be present (or consistently absent, with the value known from the argument).

Minor

4. Redundant validation in `_compute_recursive_fitted_values_on_demand`

The private method checks h <= 1 and global/group transforms at the top, but forecast_fitted_values already validates both before calling it — making those guards unreachable. Either remove them from the private method (since it is internal) or remove the duplicate check from forecast_fitted_values.

5. Shallow copy of `_fitted_train_df_`

self._fitted_train_df_ = ufp.copy_if_pandas(df, deep=False)

A shallow copy means if the caller mutates column values in-place after fit(), the cached training data is silently affected. Since this cache is used for on-demand computation later, a deep copy (or at minimum a docstring/comment noting the limitation) would be safer.

6. Weak assertion in `test_recursive_forecast_fitted_values_on_demand_h`

restored = fitted_h3.merge(df[["unique_id", "ds", "y"]], on=["unique_id", "ds"], suffixes=("_fit", "_orig"))
np.testing.assert_allclose(restored["y_fit"].values, restored["y_orig"].values)

y_fit here is the target column copied from the training data, not the model's prediction — this test only checks that actual target values were joined correctly, not that predictions are reasonable. The test should validate the model output column (e.g., LinearRegression) against something meaningful (e.g., that predictions are finite, or that h=3 predictions differ from h=1 predictions).

7. PR checklist is incomplete

"The tests pass", "All linting tasks pass", "The notebooks are clean" are all unchecked. Please confirm these before merging.

Positive notes

The direct model path (max_horizon) is correct and clean.
Making h keyword-only (*) is good API design for future-proofing.
The polars↔pandas bridging in forecast_fitted_values is handled correctly.
The new tests cover the right scenarios: recursive, direct, error cases, and positional compatibility.

…rollout and stronger validation/tests

nasaul

PR Review — Round 2

Great progress on the previous round of comments — all 6 items have been addressed. Two blocking issues remain before this can merge.

Bug — Static features crash `forecast_fitted_values(h>1)` [blocking]

Location: mlforecast/forecast.py:617–636

_compute_recursive_fitted_values_on_demand strips static columns from hist before calling temp_ts._fit(), but then passes static_features=self.ts.static_features to _fit. Inside TimeSeries._fit, it tries to extract the static columns from hist, which no longer has them, causing:

ValueError: Feature names seen at fit time, yet now missing: static_0

This affects any user who passes static features to fit() — including the default static_features=None which auto-detects all non-time/non-target columns as static.

The failing test tests/test_forecast.py::test_recursive_forecast_fitted_values_on_demand_h_with_static_features already covers this path and currently reproduces the crash.

Recommended fix: tell _fit there are no static columns to extract, then copy the already-computed static_features_ from the original TimeSeries:

temp_ts._fit(hist, ..., static_features=[id_col], ...)
temp_ts.static_features_ = self.ts.static_features_

This works correctly because static features are still used during prediction. In TimeSeries.predict, the feature matrix is built by horizontally concatenating static_features_ with the lag/date features (see core.py:981). By copying self.ts.static_features_ after _fit, the correct static values — which are constants per series by definition — are present when temp_ts.predict(models=self.models_, ...) is called, so the models see exactly the same feature matrix they were trained on. The only thing we skip is redundantly re-extracting them from hist, which is what was crashing.

`_fitted_train_df_` doubles memory footprint [blocking]

A deep copy of the full training DataFrame is stored in self._fitted_train_df_ indefinitely after fit(fitted=True). For large datasets this permanently doubles the memory footprint of the fitted object.

Required fix: accept the training data as an optional parameter to forecast_fitted_values() so users can opt out of the storage cost entirely:

def forecast_fitted_values(self, h=1, *, train_df=None): ...

When train_df is provided, use it directly without storing a copy on self. When it is None, fall back to the cached _fitted_train_df_ for convenience. This keeps the current zero-friction API for small datasets while giving memory-constrained users a way to avoid the overhead entirely. The fit and forecast_fitted_values docstrings should document this trade-off.

Nit — Document why the per-origin loop was chosen over full vectorization

In the previous review I suggested a fully vectorized approach: preprocess all T origins into a single feature matrix of shape (T, n_features) and then do exactly h batched model calls updating lag columns in-place, reducing model evaluations from O(T²) down to O(h) regardless of dataset size. The original implementation was O(T²); the current one improves this to O(T·h) — one temp_ts.predict(horizon=h) per origin, each doing a full h-step autoregressive rollout — but stops short of the fully vectorized O(h) path.

The current approach may be intentional, but this reasoning isn't captured anywhere. Please add a comment above _compute_recursive_fitted_values_on_demand explaining the trade-off so future maintainers understand why the vectorized path was not taken.

Nit — Silent no-op in direct model path

In forecast_fitted_values (forecast.py:874–886), the h argument is silently ignored if "h" not in res.columns. An assert "h" in res.columns would close this silent failure path.

Summary: Two blockers before merge: fix the static features bug (test already failing) and add a train_df parameter to forecast_fitted_values() to avoid the memory doubling. All previous comments have been resolved.

janrth · 2026-03-28T17:05:19Z

@codex

janrth · 2026-03-28T17:06:04Z

We kept the current recursive execution model intentionally. MLForecast already vectorizes recursive forecasting across series at each forecast step, but remains iterative across horizon steps. A fully origin-batched fitted-values implementation would be a separate rollout engine that batches across training origins as an additional axis.

That is a materially larger algorithmic change than this PR, because it would need to reproduce the exact current recursive semantics for lag updates, lag transforms, date features, static features, dynamic exogenous alignment, and target transforms. For this PR we prioritized correctness and parity with the existing predict path, and documented that trade-off rather than introducing a second, more specialized engine only for multi-step fitted values.

chatgpt-codex-connector

💡 Codex Review

mlforecast/mlforecast/auto.py

Line 653 in 402fea4

on=[model.ts.id_col, model.ts.time_col],

Join AutoML fitted outputs using horizon key

MLForecast.forecast_fitted_values now returns an h column, but AutoMLForecast.forecast_fitted_values still merges per-model frames only on id/time. With multiple models this creates duplicated/suffixed horizon columns (h_x/h_y or similar), so the combined output no longer has a single reliable horizon column and can break downstream consumers that expect one h field per row.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

janrth · 2026-03-28T17:45:57Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2326ef6769

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

nasaul

Great work Jan, just added better docstring.

add horizon for fitted values

914d2ce

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

Comment thread mlforecast/forecast.py

Comment thread mlforecast/forecast.py

fix edge cases when global rolling is used

33b2b1c

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

Comment thread mlforecast/forecast.py Outdated

nasaul linked an issue Mar 9, 2026 that may be closed by this pull request

Multi-Step Training Predictions #346

Closed

nasaul changed the title ~~add horizon for fitted values~~ [FEAT] Add Horizon for fitted values Mar 9, 2026

janrth and others added 5 commits March 10, 2026 22:32

implement robust on-demand multi-step fitted values with linear-time …

6bfc163

…rollout and stronger validation/tests

Merge branch 'main' into feature/multi-step_training_prediction

fb47f33

fix h pd - pl issue

7ec5749

Adds testing with static features

25aeb42

Merge branch 'main' into feature/multi-step_training_prediction

c53d71f

nasaul reviewed Mar 12, 2026

View reviewed changes

janrth and others added 2 commits March 28, 2026 14:08

Merge branch 'main' into feature/multi-step_training_prediction

791f77a

Fix recursive fitted-values caching and static-feature handling

402fea4

chatgpt-codex-connector bot reviewed Mar 28, 2026

View reviewed changes

Comment thread mlforecast/forecast.py Outdated

Fix fitted-values caching and preserve fit compatibility

2326ef6

chatgpt-codex-connector bot reviewed Mar 28, 2026

View reviewed changes

Comment thread mlforecast/forecast.py

Comment thread mlforecast/forecast.py

Update docstring to document the new h column

ea34e6c

nasaul approved these changes Apr 1, 2026

View reviewed changes

nasaul merged commit cb8b558 into Nixtla:main Apr 1, 2026
21 checks passed

Conversation

janrth commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

janrth commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

nasaul commented Mar 9, 2026

Major

1. Quadratic complexity in _compute_recursive_fitted_values_on_demand

2. Fragile self.ts state mutation in the loop

3. Inconsistent h column presence

Minor

4. Redundant validation in _compute_recursive_fitted_values_on_demand

5. Shallow copy of _fitted_train_df_

6. Weak assertion in test_recursive_forecast_fitted_values_on_demand_h

7. PR checklist is incomplete

Positive notes

Uh oh!

nasaul left a comment

Choose a reason for hiding this comment

Bug — Static features crash forecast_fitted_values(h>1) [blocking]

_fitted_train_df_ doubles memory footprint [blocking]

Nit — Document why the per-origin loop was chosen over full vectorization

Nit — Silent no-op in direct model path

Uh oh!

janrth commented Mar 28, 2026

Uh oh!

janrth commented Mar 28, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

janrth commented Mar 28, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

nasaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq bot commented Mar 7, 2026 •

edited

Loading

1. Quadratic complexity in `_compute_recursive_fitted_values_on_demand`

2. Fragile `self.ts` state mutation in the loop

3. Inconsistent `h` column presence

4. Redundant validation in `_compute_recursive_fitted_values_on_demand`

5. Shallow copy of `_fitted_train_df_`

6. Weak assertion in `test_recursive_forecast_fitted_values_on_demand_h`

Bug — Static features crash `forecast_fitted_values(h>1)` [blocking]

`_fitted_train_df_` doubles memory footprint [blocking]