Releases: TomCardeLo/boa-forecaster
v2.4.0 — Probabilistic, Regulatory & Deep-Learning Horizons
Track H release closing the post-v2.3 backlog plus 2026-04-20 feedback
from the CAR PM2.5 hourly pipeline (tasks/feedback_aire.md).
No breaking API changes — additive and behaviour-tightening only.
Highlights
New model families
ProphetSpec(H1) — Meta's Prophet for trend + seasonality + holidays. Behind theprophetextra.QuantileMLSpec(H2) — probabilistic forecasts via LightGBM/XGBoost quantile objectives + newmetrics_probabilistic.py(pinball_loss,interval_coverage).LSTMSpec(H3) — PyTorch LSTM baseline behind thedeepextra (deliberately not in[all]).
Regulatory metrics & presets (feedback_aire §2)
hit_rate_weighted+f1_by_bucketin coremetrics.py(H7-core).- New
presets/air_quality.py—ICA_EDGES_PM25_CO2017,ICA_EDGES_PM25_USAQI,hit_rate_ica,hit_rate_ica_weighted(H7-presets). - First preset pack — opens the door for
presets/demand.py,presets/energy.py,presets/finance.pyin v2.5+.
Hourly SARIMA (feedback_aire §1 residual)
SARIMASpec.for_frequency(freq)classmethod with frequency-awareseasonal_perioddefaults; tuneable[24, 168]on hourly data (H8).
Ensemble safety & high-volatility WMA (feedback_aire §3 + §4)
EnsembleSpecwarns wheninverse_cv_lossmixes early-stopping and full-fold members (H9a). Newuses_early_stoppingflag onModelSpec.WMA_THRESHOLD_HIGH_VOLATILITY = 3.5named constant for peaky series (H9b).
Post-training seasonal bias correction (feedback_aire §5)
postprocess.py—compute_seasonal_bias+apply_seasonal_biasmirroring CAR's productionsesgo_mensual_para_ajuste.csvpattern (H5).optimize_model(..., apply_bias_correction=True)opt-in kwarg + CLI--bias-correctionflag.
Pydantic finishing touches
BoaConfig.from_dict,Literalvalidators on sub-models,--strictCLI flag to flipextra="allow"→extra="forbid"(H4).
Release gate
- 814 tests passed, 91% coverage
- ruff / black / bandit (no high/medium): clean
boa-forecaster run --strictsmoke-validatesconfig.example.yaml
Contributors
Track H executed by parallel Sonnet implementer subagents under Opus orchestration; reviews by Opus code-reviewer.
Special thanks to Daniel Méndez and the CAR / Cundinamarca PM2.5 hourly pipeline team (Bogotá + Cundinamarca, 34 monitoring stations, 2016–2026) for the 2026-04-20 feedback that drove H5, H7, H8, and H9.
Full diff: v2.3.0...v2.4.0
v2.3.0 — Correctness & Ecosystem
Correctness & ecosystem release bundling Tracks E/F/G of the post-v2.2 plan.
Four silent-correctness bugs fixed, quality-hardening touch-ups on validation/metric/preprocessor, and small ecosystem primitives surfaced by a real-world consumer. No breaking API changes.
Highlights
Fixed — Correctness (Track E)
EnsembleSpec.needs_featuresnow a@propertyreflecting members (#17)BaseMLSpecauto-injectsforecast_horizoninto defaultFeatureConfig.lag_periods(#17)- Optuna
MedianPruner(n_startup_trials=5, n_warmup_steps=1)wired intooptimize_model— 20–40% faster TPE (#17)
Performance (Track E)
build_ensembleparallelised viajoblib.Parallelwith newn_jobskwarg — ~75% faster on 4-member ensembles (#17)
Quality hardening (Track F)
- New
hit_rate(y_true, y_pred, edges)metric for bucket-accuracy reporting (#16) - New
flag_intermittent(df, group_cols, value_col, threshold=0.7)preprocessor helper (#16) walk_forward_validationnow acceptsn_folds >= 1andforecast_horizon=default fortest_size(#16)combined_metricdelegates tobuild_combined_metric, soregister_metricaffects both paths (#16)
Ecosystem (Track G)
FeatureConfig.for_frequency(freq, **overrides)classmethod — MS / W / D / h defaults (#15)EnsembleSpecdocstring:inverse_cv_lossweighting caveats (#17)
See CHANGELOG.md for the full entry.
v2.2.0 — Tracks A/B/C/D: release hygiene, coverage, perf, extensibility
Additive release on the v2.x line bundling Tracks A / B / C / D of the post-v2.1.0 plan plus the A2 vectorised batch API. No breaking changes — sarima_bayes shim still emits DeprecationWarning and remains importable.
Highlights
Extensibility (Track D, #13)
- Click CLI —
boa-forecaster run | compare | validate(alsopython -m boa_forecaster). Seedocs/cli.md. - Pydantic v2 config schema — strongly-typed validation of
config.yamlat load time. EnsembleSpec— weighted or stacked ensemble over any registeredModelSpecs. Seedocs/ensemble.md.
Performance (Track C, #14)
- Deterministic feature cache — calendar/trend features computed once per series, reused across walk-forward folds. ~30% speedup on 60-month series × 10 folds.
- Parallel walk-forward CV —
walk_forward_validation(..., n_jobs=1)viajoblib.Parallel(backend="loky"); default preserves sequential behaviour. np.isinfinf-check (optimizer._validate_series) — short-circuits on firstinf, ~10–20× faster than the priorseries.isin([np.inf, -np.inf]).any().pytest-benchmarkregression suite (tests/perf/) — weekly CI job compares against committed baseline.
Coverage (Track B, #12)
data_loader.py→ 100% (newtest_data_loader_errors.py)validation.py→ 98% (expandedtest_validation.py, includesn_jobs=2)benchmarks.py→ 95% (newtest_benchmarks_v2.py)
Release hygiene (Track A, #11)
- Security scan CI step on push/PR.
test_optional_deps.py— asserts XGBoost/LightGBM specs degrade cleanly when extras are missing.- Internal cleanup of duplicate files inside the
sarima_bayes/shim (public shim surface preserved).
Performance (A2, #9)
weighted_moving_stats_batch— vectorised multi-series clipping.
Fixed
- mypy errors on Python 3.11 CI.
Public API additions
boa_forecaster.EnsembleSpecboa_forecaster.cli+boa-forecasterconsole entry pointboa_forecaster.config_schema(Pydantic models)weighted_moving_stats_batchwalk_forward_validation(..., n_jobs=1)
Full changelog
See CHANGELOG.md and the compare view.
v2.1.0 — Phase A–E improvements on the v2 framework
Feature release on the v2.x line. Ships the full Phase A–E improvement plan (perf, tests, code quality, CI, docs) on top of the v2.0.0 framework foundation. No breaking API changes since v2.0.0 — additions and deprecations only.
Migration note.
import sarima_bayescontinues to work via a compatibility shim that re-exports the entireboa_forecasterAPI and emits aDeprecationWarning.pred_arima,forecast_arima, andoptimize_arimaalso keep working but warn — they will be removed in v3.0.
Highlights
Reliability & observability
OptimizationResult.is_fallback: booldistinguishes a genuine optimum from a warm-start returned after a study-level crash; crash now logs atWARNINGwithexc_info=Trueinstead of being silently swallowed. See ADR-002.- Thread-safe
METRIC_REGISTRYviathreading.Lock. SARIMASpec.MAX_NON_SEASONAL_ORDER/MAX_SEASONAL_ORDERnamed constants replacing magic4/3thresholds.
Performance
weighted_moving_statsvectorised — newweighted_moving_stats_serieshelper usingsliding_window_view. 18–130× faster, mathematically identical output.fill_blanksvectorised —MultiIndex.from_product+reindexinstead of cross-join + merge. ~1.2–1.5× faster, lower peak memory. (Behaviour change: duplicate(date, group)rows are now summed; pipelines runningclean_zerosfirst are unaffected.)recursive_forecastpre-allocated — 5–20× speedup on long horizons._validate_seriesearly-exit viaseries.isin([np.inf, -np.inf]).any().
Code quality
BaseMLSpecshared base for tree-based ML specs. Removes ~329 lines of duplication acrossRandomForestSpec/XGBoostSpec/LightGBMSpec. Subclasses override only_fit_final,search_space,warm_starts.- Type-annotation completeness pass across
models/base.py,validation.py,features.py,data_loader.py.
Tests
- SARIMA constraint enforcement (
test_sarima_constraints.py). - Feature-leakage regression tests (
test_features.py). - Benchmark silent-failure tests.
- Full-pipeline integration test (
tests/integration/test_full_pipeline.py). - 19 Hypothesis property-based metric tests (
test_metrics_property.py). - Optimizer 500-pt stress test (
test_optimizer_stress.py,@pytest.mark.slow, < 30 s budget).
CI & tooling
mypystatic type checking on Python 3.11 matrix entry.- Weekly slow-test job (Mondays 06:00 UTC,
[dev,ml]extras, 20-min timeout). - Coverage threshold
--cov-fail-under=80on core + ML jobs. hypothesis>=6.0added to[dev]extras.
Documentation
- Architecture Decision Records (
docs/adr/):- ADR-001 —
ModelSpecasProtocol, notABC - ADR-002 — Optimizer soft-failure (
is_fallback) - ADR-003 — Combined objective
0.7·sMAPE + 0.3·RMSLE
- ADR-001 —
- Extension guide (
docs/extending_models.md) — end-to-end walkthrough with a worked Prophet example,BaseMLSpecshortcut for tree models, test checklist, and pitfalls table. - Documented rationale for decaying weights
[0.3, 0.2, 0.1]instandardization.py.
Deprecations
pred_arima,forecast_arima,optimize_arima— emitDeprecationWarning; removal in v3.0.sarima_bayespackage — emitsDeprecationWarningon import; re-exports everything fromboa_forecaster.
Full changelog: v2.0.0...v2.1.0
v2.0.0 — Multi-model forecasting framework
What's new
v2.0 turns the library from a SARIMA-only tool into a pluggable multi-model forecasting framework.
New models
- Random Forest (
RandomForestSpec) — scikit-learn, always available - XGBoost (
XGBoostSpec) — optional extra:pip install -e ".[xgboost]" - LightGBM (
LightGBMSpec) — optional extra:pip install -e ".[lightgbm]"
New API
optimize_model(series, model_spec, n_trials)— unified entry point for any modelModelSpecprotocol — add a new model in ~50 linesFeatureEngineer— lags, rolling stats, calendar, trend features for ML modelsrun_model_comparison()— multi-model head-to-head comparison
Infrastructure
- Primary package renamed to
boa_forecaster;sarima_bayesis a deprecated compatibility shim (fully backward-compatible, emitsDeprecationWarning) - CI split into
test-core-only(Python 3.9/3.10/3.11) andtest-ml-extras(Python 3.11 + ML libs) - 368 unit tests + integration tests
Backward compatibility
All v1.x code continues to work:
from sarima_bayes import optimize_arima, forecast_arima # emits DeprecationWarningRecommended migration:
from boa_forecaster import optimize_model
from boa_forecaster.models import SARIMASpec
result = optimize_model(series, SARIMASpec(), n_trials=30)Installation
pip install -e "." # core (SARIMA + Random Forest)
pip install -e ".[ml]" # + XGBoost + LightGBM
pip install -e ".[dev,ml]" # + dev toolsv1.4.0 — Optional Country/SKU columns
What's new
- Optional
CountryandSKUcolumns — the data loader now accepts input files with or without these columns. When absent, the pipeline treats the entire dataset as a single group and skips per-group filtering. - Removed
merge_representatives— the preprocessor no longer exposes this helper; grouping logic is handled transparently by the loader.
Breaking changes
None. Existing inputs with Country/SKU columns continue to work unchanged.
Upgrade
pip install --upgrade boa-sarima-forecasterv1.3.0 — Configurable Metric Composition
What's new
The Bayesian optimiser objective is now fully configurable. Instead of being locked to 0.7 × sMAPE + 0.3 × RMSLE, any weighted combination of built-in metrics can be used — making the library applicable beyond demand forecasting.
New metrics
| Name | Formula | Best suited for |
|---|---|---|
mae |
mean(|y − ŷ|) |
Revenue, price — absolute scale matters |
rmse |
√mean((y − ŷ)²) |
Penalises large deviations |
mape |
100 × mean(|y − ŷ| / (|y| + ε)) |
Clean series without zeros |
New API
METRIC_REGISTRY— dict mapping metric names to callablesbuild_combined_metric(components)— factory that builds any weighted objectiveoptimize_arima(..., metric_components=[...])— pass a custom objective directly
Configuration
metrics:
components:
- metric: smape
weight: 0.7
- metric: rmsle
weight: 0.3Backward compatibility
Default behaviour (0.7 × sMAPE + 0.3 × RMSLE) is unchanged. All existing call sites continue to work without modification.
Changes
src/sarima_bayes/metrics.py—mae,rmse,mape,METRIC_REGISTRY,build_combined_metricsrc/sarima_bayes/config.py—DEFAULT_METRIC_COMPONENTSconfig.example.yaml—metrics.componentssectionsrc/sarima_bayes/optimizer.py—metric_componentskwargsrc/sarima_bayes/__init__.py— new public exportstests/unit/test_metrics.py— 15 new tests (96 total, 100% metrics coverage)README.md— new Configurable Metric section
v1.2.0 — Configurable Time-Series Frequency
What's Changed
Added
- Configurable frequency — the pipeline now works with any pandas DateOffset alias, not only monthly
"MS". Passfreqto set the sampling rate andmfor the seasonal period:pred_arima,forecast_arima,forecast_arima_with_group— newfreq: str = "MS"parametervalidate_by_group— newfreq: str = "MS"parameterets_model— newm: int = 12parameterauto_arima_nixtla— newm: int = 12andfreq: str = "MS"parametersrun_benchmark_comparison— newm: int = 12andfreq: str = "MS"parameters, forwarded to all baselines
_freq_to_period_aliashelper inpreprocessor.py— maps DateOffset aliases ("MS","W","D","H") to Period aliases required bypd.Series.dt.to_period()data.freqkey inconfig.example.yamlwith alias/seasonal_period coupling table- 27 new tests —
test_preprocessor.py(19), new benchmark and validation coverage (8)
Changed
preprocessor.fill_blanks— date normalisation is now freq-aware; weekly ("W") uses end-of-period convention to align withpd.date_rangeSunday anchoringconfig.example.yaml—model.sarima.seasonal_periodcomment shows recommendedmper frequency
Backward Compatibility
All new parameters default to freq="MS" / m=12. Zero existing call sites require changes.
Usage Examples
# Weekly data, annual seasonality
fill_blanks(df, freq="W")
pred_arima(df, "Date", "Sales", order=(1,1,1), freq="W")
run_benchmark_comparison(df, ..., freq="W", m=52)
# Daily data, weekly seasonality
ets_model(train, forecast_horizon=7, m=7)
validate_by_group(df, ..., freq="D", n_folds=3, test_size=7, min_train_size=28)Full Changelog
v1.1.0 — Configurable Outlier Clipping Threshold
What's Changed
Added
- Configurable outlier-clipping threshold —
clip_outliersandweighted_moving_statsnow accept athresholdparameter (default2.5). Previously the σ multiplier was hard-coded; it can now be set per-call or globally viaconfig.yamlunderstandardization.threshold.
Changed
config.example.yaml— addedstandardization.threshold: 2.5key so users can tune sensitivity without touching source code.docs/methodology.md— updated standardisation section to document the new parameter.
Fixed
- Renamed internal parameter
sigma_threshold→thresholdinclip_outliersto match the public API expected by the test suite. - Resolved ruff lint errors and applied black auto-formatting to
config.pythat were blocking CI.
Full Changelog
v1.0.0 — Initial public release
1.0.0 — 2026-03-17
Added
- SARIMA + Bayesian Optimisation pipeline — end-to-end demand forecasting using
Optuna TPE to search ARIMA orders(p, d, q)and seasonal orders(P, D, Q, m). - Walk-forward (expanding-window) cross-validation — prevents look-ahead bias by
evaluating each fold on true out-of-sample periods. - Benchmark comparison — walk-forward results compared against Seasonal Naïve,
ETS (Holt-Winters), and AutoARIMA (statsforecast) baselines. - Weighted moving-average outlier standardisation — clips demand observations to
±1σ of their neighbourhood; both raw and adjusted series are modelled and the better
one is selected automatically. - sMAPE and RMSLE metrics — combined cost function
0.7 × sMAPE + 0.3 × RMSLE
used as the Optuna objective; both metrics available individually viasarima_bayes.metrics. - Demo notebook (
notebooks/demo.ipynb) — end-to-end walkthrough using synthetic
data; no real data required. - pytest test suite (
tests/) — unit and integration tests with coverage reporting. - GitHub Actions CI (
.github/workflows/ci.yml) — runs linting (ruff, black) and
the full test suite on every push and pull request. - Full type hints and Google-style docstrings — all 19 public functions across
src/sarima_bayes/annotated with Python 3.10+X | Yunion syntax, Args, Returns,
Raises, and Example sections. config.yaml/config.example.yaml— YAML-driven configuration for data paths,
optimisation budget, forecast horizon, and output location.docs/methodology.md— detailed technical description of the five-stage pipeline.- Forecast plot (
docs/img/forecast_example.png) — example output image showing
training history, last-24-months actuals, point forecast, and 80%/95% CI bands;
generated reproducibly viascripts/generate_plots.py.