feat(evaluation): Add Pareto-Optimal Evaluation by ankitlade12 · Pull Request #213 · Nixtla/utilsforecast

ankitlade12 · 2026-03-02T05:14:48Z

Description

This pull request introduces multi-objective evaluation capabilities to utilsforecast. It adds a robust ParetoFrontier class directly within evaluation.py, providing a model-agnostic and dataframe-agnostic way to identify the best-performing models across conflicting metrics (e.g., minimizing RMSE while minimizing MAE, or minimizing latency while maximizing accuracy).

Since utilsforecast acts as the foundational evaluation layer for Nixtla's ecosystem, integrating Pareto selection natively enables downstream libraries (like mlforecast and statsforecast) to leverage model multi-objective benchmarking out-of-the-box using the standard output of evaluate().

Key Changes

Added ParetoFrontier class in evaluation.py: Includes is_dominated mathematically validated bounding logic and exposed find_non_dominated routines.
Built-in 2D Plotting: Implemented plot_pareto_2d() to visually inspect the trade-off frontier. Matplotlib is lazily imported and handled gracefully with an explicit warnings.warn if missing, addressing reviewer feedback to avoid raw print statements.
Dataframe Agnosticism (AnyDFType): Ensures pandas and polars arrays are passed cleanly through the mathematical logic, addressing previous maintainer concerns surrounding hard pandas dependencies.
Exposed in __init__.py: Integrated evaluate and ParetoFrontier into __all__ for easy top-level access (from utilsforecast import ParetoFrontier).

Example Usage

from utilsforecast.evaluation import evaluate, ParetoFrontier
from utilsforecast.losses import mae, rmse

# 1. Run standard evaluation
performance_df = evaluate(cv_results, metrics=[mae, rmse], agg_fn="mean")

# 2. Get Pareto Optimal Models
pareto_optimal_df = ParetoFrontier.find_non_dominated(performance_df)

# 3. Visualize Trade-offs
ax = ParetoFrontier.plot_pareto_2d(performance_df, metric_x='rmse', metric_y='mae')

CLAassistant · 2026-03-02T05:14:55Z

All committers have signed the CLA.

nasaul

The core Pareto logic looks algorithmically sound, but the PR needs some changes and add tests for ParetoFrontier

utilsforecast/evaluation.py

utilsforecast/__init__.py

utilsforecast/evaluation.py

- Add evaluate and ParetoFrontier to __init__.py __all__ - Use narwhals native DataFrame filtering in ParetoFrontier - Properly extract evaluate() model columns in plot_pareto_2d - Add tests for ParetoFrontier

nasaul · 2026-04-05T21:54:02Z

Thanks for the updates — the core Pareto dominance algorithm is correct and the narwhals-based approach in find_non_dominated is the right direction. There are a few bugs and design issues that need to be addressed before merging.

Bugs

1. plot_pareto_2d breaks polars (hard pd.DataFrame dependency)

In the "metric" column branch, a plain pd.DataFrame is constructed internally, and then pandas-only methods are called on the result of find_non_dominated:

pareto_sorted = pareto_df.sort_values(metric_x)  # pandas only
ax.scatter(pareto_df[metric_x], ...)              # pandas-only indexing
for _, row in plot_df.iterrows():                 # pandas only

The method signature accepts AnyDFType but breaks silently for polars input. Either convert to pandas explicitly at the start of the plotting method, or use narwhals throughout.

2. plot_pareto_2d mutates a caller-provided DataFrame

plot_df["model"] = plot_df.index.astype(str)

This modifies the passed-in DataFrame in place when it's pandas. Use .copy() or .assign() instead.

Design Issues

3. __init__.py unexpectedly exports evaluate at package level

evaluate was not previously exported from utilsforecast.__init__. Adding it here is an unintended API surface change and causes eager import of all of evaluation.py on every import utilsforecast. Only ParetoFrontier (if desired) should be added.

4. Confusing metrics parameter in find_non_dominated

In the "metric" column branch (evaluate() output format), when metrics is passed it is actually used as model names, not metric names:

else:
    models = metrics  # misleading: metrics is used as model names here

A user will naturally pass metric names like ["rmse", "mae"] and get unexpected behavior. Please rename the parameter (e.g., model_subset) or document this clearly, and consider whether the evaluate() format branch even needs this override.

Minor

evaluation.py module-level __all__ still only has ['evaluate'] — add ParetoFrontier.
Missing two blank lines before the ParetoFrontier class definition (PEP 8).
Test has a stray # wait: debug comment that should be removed.

feat(evaluation): Add Pareto-Optimal Evaluation

00dbac1

ankitlade12 force-pushed the feat/multi-objective-eval branch from 4328af7 to 00dbac1 Compare March 4, 2026 07:58

nasaul requested changes Mar 9, 2026

View reviewed changes

utilsforecast/evaluation.py Outdated Show resolved Hide resolved

utilsforecast/__init__.py Show resolved Hide resolved

utilsforecast/evaluation.py Outdated Show resolved Hide resolved

fix(pareto): address PR review comments

d70f884

- Add evaluate and ParetoFrontier to __init__.py __all__ - Use narwhals native DataFrame filtering in ParetoFrontier - Properly extract evaluate() model columns in plot_pareto_2d - Add tests for ParetoFrontier

ankitlade12 requested a review from nasaul March 10, 2026 04:26

ankitlade12 added 4 commits March 11, 2026 16:18

chore: fix ruff import order

2ff1c85

Merge branch 'main' into feat/multi-objective-eval

40a22c5

Merge branch 'main' into feat/multi-objective-eval

c14cf60

Merge branch 'main' into feat/multi-objective-eval

b0b0737

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluation): Add Pareto-Optimal Evaluation#213

feat(evaluation): Add Pareto-Optimal Evaluation#213
ankitlade12 wants to merge 6 commits intoNixtla:mainfrom
ankitlade12:feat/multi-objective-eval

ankitlade12 commented Mar 2, 2026

Uh oh!

CLAassistant commented Mar 2, 2026 •

edited

Loading

Uh oh!

nasaul left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nasaul commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ankitlade12 commented Mar 2, 2026

Description

Key Changes

Example Usage

Uh oh!

CLAassistant commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nasaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nasaul commented Apr 5, 2026

Bugs

Design Issues

Minor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Mar 2, 2026 •

edited

Loading