+
+
+
+
+
+The above figure shows a simple hierarchical structure where we have
+four bottom-level series, two middle-level series, and the top level
+representing the total aggregation. Its hierarchical aggregations or
+coherency constraints are:
+
+$$
+y_{\mathrm{Total},\tau} = y_{\beta_{1},\tau}+y_{\beta_{2},\tau}+y_{\beta_{3},\tau}+y_{\beta_{4},\tau}
+ \qquad \qquad \qquad \qquad \qquad \\
+ \mathbf{y}_{[a],\tau}=\left[y_{\mathrm{Total},\tau},\; y_{\beta_{1},\tau}+y_{\beta_{2},\tau},\;y_{\beta_{3},\tau}+y_{\beta_{4},\tau}\right]^{\intercal}
+ \qquad
+ \mathbf{y}_{[b],\tau}=\left[ y_{\beta_{1},\tau},\; y_{\beta_{2},\tau},\; y_{\beta_{3},\tau},\; y_{\beta_{4},\tau} \right]^{\intercal}
+$$
+
+Luckily these constraints can be compactly expressed with the following
+matrices:
+
+$$
+
+\mathbf{S}_{[a,b][b]}
+=
+\begin{bmatrix}
+\mathbf{A}_{\mathrm{[a][b]}} \\
+ \\
+ \\
+\mathbf{I}_{\mathrm{[b][b]}} \\
+ \\
+\end{bmatrix}
+=
+\begin{bmatrix}
+1 & 1 & 1 & 1 \\
+1 & 1 & 0 & 0 \\
+0 & 0 & 1 & 1 \\
+1 & 0 & 0 & 0 \\
+0 & 1 & 0 & 0 \\
+0 & 0 & 1 & 0 \\
+0 & 0 & 0 & 1 \\
+\end{bmatrix}
+
+$$
+
+where $\mathbf{A}_{[a,b][b]}$ aggregates the bottom series to the upper
+levels, and $\mathbf{I}_{\mathrm{[b][b]}}$ is an identity matrix. The
+representation of the hierarchical series is then:
+
+$$
+
+\mathbf{y}_{[a,b],\tau} = \mathbf{S}_{[a,b][b]} \mathbf{y}_{[b],\tau}
+
+$$
+
+To visualize an example, in Figure 2, one can think of the hierarchical
+time series structure levels to represent different geographical
+aggregations. For example, in Figure 2, the top level is the total
+aggregation of series within a country, the middle level being its
+states and the bottom level its regions.
+
+
+
+## 2. Hierarchical Forecast
+
+To achieve **“coherency”**, most statistical solutions to the
+hierarchical forecasting challenge implement a two-stage reconciliation
+process.
+1. First, we obtain a set of the base forecast
+$\mathbf{\hat{y}}_{[a,b],\tau}$
+
+1. Later, we reconcile them into coherent forecasts
+ $\mathbf{\tilde{y}}_{[a,b],\tau}$.
+
+Most hierarchical reconciliation methods can be expressed by the
+following transformations:
+
+$$\tilde{\mathbf{y}}_{[a,b],\tau} = \mathbf{S}_{[a,b][b]} \mathbf{P}_{[b][a,b]} \hat{\mathbf{y}}_{[a,b],\tau}$$
+
+The HierarchicalForecast library offers a Python collection of
+reconciliation methods, datasets, evaluation and visualization tools for
+the task. Among its available reconciliation methods we have
+[`BottomUp`](https://Nixtla.github.io/hierarchicalforecast/src/methods.html#bottomup),
+[`TopDown`](https://Nixtla.github.io/hierarchicalforecast/src/methods.html#topdown),
+[`MiddleOut`](https://Nixtla.github.io/hierarchicalforecast/src/methods.html#middleout),
+[`MinTrace`](https://Nixtla.github.io/hierarchicalforecast/src/methods.html#mintrace),
+[`ERM`](https://Nixtla.github.io/hierarchicalforecast/src/methods.html#erm).
+Among its probabilistic coherent methods we have
+[`Normality`](https://Nixtla.github.io/hierarchicalforecast/src/probabilistic_methods.html#normality),
+[`Bootstrap`](https://Nixtla.github.io/hierarchicalforecast/src/probabilistic_methods.html#bootstrap),
+[`PERMBU`](https://Nixtla.github.io/hierarchicalforecast/src/probabilistic_methods.html#permbu).
+
+## 3. Minimal Example
+
+
+```python
+!pip install hierarchicalforecast statsforecast datasetsforecast
+```
+
+### Wrangling Data
+
+
+```python
+import numpy as np
+import pandas as pd
+```
+
+We are going to creat a synthetic data set to illustrate a hierarchical
+time series structure like the one in Figure 1.
+
+We will create a two level structure with four bottom series where
+aggregations of the series are self evident.
+
+
+```python
+# Create Figure 1. synthetic bottom data
+ds = pd.date_range(start='2000-01-01', end='2000-08-01', freq='MS')
+y_base = np.arange(1,9)
+r1 = y_base * (10**1)
+r2 = y_base * (10**1)
+r3 = y_base * (10**2)
+r4 = y_base * (10**2)
+
+ys = np.concatenate([r1, r2, r3, r4])
+ds = np.tile(ds, 4)
+unique_ids = ['r1'] * 8 + ['r2'] * 8 + ['r3'] * 8 + ['r4'] * 8
+top_level = 'Australia'
+middle_level = ['State1'] * 16 + ['State2'] * 16
+bottom_level = unique_ids
+
+bottom_df = dict(ds=ds,
+ top_level=top_level,
+ middle_level=middle_level,
+ bottom_level=bottom_level,
+ y=ys)
+bottom_df = pd.DataFrame(bottom_df)
+bottom_df.groupby('bottom_level').head(2)
+```
+
+| | ds | top_level | middle_level | bottom_level | y |
+|-----|------------|-----------|--------------|--------------|-----|
+| 0 | 2000-01-01 | Australia | State1 | r1 | 10 |
+| 1 | 2000-02-01 | Australia | State1 | r1 | 20 |
+| 8 | 2000-01-01 | Australia | State1 | r2 | 10 |
+| 9 | 2000-02-01 | Australia | State1 | r2 | 20 |
+| 16 | 2000-01-01 | Australia | State2 | r3 | 100 |
+| 17 | 2000-02-01 | Australia | State2 | r3 | 200 |
+| 24 | 2000-01-01 | Australia | State2 | r4 | 100 |
+| 25 | 2000-02-01 | Australia | State2 | r4 | 200 |
+
+The previously introduced hierarchical series $\mathbf{y}_{[a,b]\tau}$
+is captured within the `Y_hier_df` dataframe.
+
+The aggregation constraints matrix $\mathbf{S}_{[a][b]}$ is captured
+within the `S_df` dataframe.
+
+Finally the `tags` contains a list within `Y_hier_df` composing each
+hierarchical level, for example the `tags['top_level']` contains
+`Australia`’s aggregated series index.
+
+
+```python
+from hierarchicalforecast.utils import aggregate
+```
+
+
+```python
+# Create hierarchical structure and constraints
+hierarchy_levels = [['top_level'],
+ ['top_level', 'middle_level'],
+ ['top_level', 'middle_level', 'bottom_level']]
+Y_hier_df, S_df, tags = aggregate(df=bottom_df, spec=hierarchy_levels)
+print('S_df.shape', S_df.shape)
+print('Y_hier_df.shape', Y_hier_df.shape)
+print("tags['top_level']", tags['top_level'])
+```
+
+``` text
+S_df.shape (7, 5)
+Y_hier_df.shape (56, 3)
+tags['top_level'] ['Australia']
+```
+
+
+```python
+Y_hier_df.groupby('unique_id').head(2)
+```
+
+| | unique_id | ds | y |
+|-----|---------------------|------------|-----|
+| 0 | Australia | 2000-01-01 | 220 |
+| 1 | Australia | 2000-02-01 | 440 |
+| 8 | Australia/State1 | 2000-01-01 | 20 |
+| 9 | Australia/State1 | 2000-02-01 | 40 |
+| 16 | Australia/State2 | 2000-01-01 | 200 |
+| 17 | Australia/State2 | 2000-02-01 | 400 |
+| 24 | Australia/State1/r1 | 2000-01-01 | 10 |
+| 25 | Australia/State1/r1 | 2000-02-01 | 20 |
+| 32 | Australia/State1/r2 | 2000-01-01 | 10 |
+| 33 | Australia/State1/r2 | 2000-02-01 | 20 |
+| 40 | Australia/State2/r3 | 2000-01-01 | 100 |
+| 41 | Australia/State2/r3 | 2000-02-01 | 200 |
+| 48 | Australia/State2/r4 | 2000-01-01 | 100 |
+| 49 | Australia/State2/r4 | 2000-02-01 | 200 |
+
+
+```python
+S_df
+```
+
+| | unique_id | Australia/State1/r1 | Australia/State1/r2 | Australia/State2/r3 | Australia/State2/r4 |
+|----|----|----|----|----|----|
+| 0 | Australia | 1.0 | 1.0 | 1.0 | 1.0 |
+| 1 | Australia/State1 | 1.0 | 1.0 | 0.0 | 0.0 |
+| 2 | Australia/State2 | 0.0 | 0.0 | 1.0 | 1.0 |
+| 3 | Australia/State1/r1 | 1.0 | 0.0 | 0.0 | 0.0 |
+| 4 | Australia/State1/r2 | 0.0 | 1.0 | 0.0 | 0.0 |
+| 5 | Australia/State2/r3 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 6 | Australia/State2/r4 | 0.0 | 0.0 | 0.0 | 1.0 |
+
+### Base Predictions
+
+Next, we compute the *base forecast* for each time series using the
+`naive` model. Observe that `Y_hat_df` contains the forecasts but they
+are not coherent.
+
+
+```python
+from statsforecast.models import Naive
+from statsforecast.core import StatsForecast
+```
+
+
+```python
+# Split train/test sets
+Y_test_df = Y_hier_df.groupby('unique_id', as_index=False).tail(4)
+Y_train_df = Y_hier_df.drop(Y_test_df.index)
+
+# Compute base Naive predictions
+# Careful identifying correct data freq, this data monthly 'M'
+fcst = StatsForecast(models=[Naive()],
+ freq='MS', n_jobs=-1)
+Y_hat_df = fcst.forecast(df=Y_train_df, h=4, fitted=True)
+Y_fitted_df = fcst.forecast_fitted_values()
+```
+
+### Reconciliation
+
+
+```python
+from hierarchicalforecast.methods import BottomUp
+from hierarchicalforecast.core import HierarchicalReconciliation
+```
+
+
+```python
+# You can select a reconciler from our collection
+reconcilers = [BottomUp()] # MinTrace(method='mint_shrink')
+hrec = HierarchicalReconciliation(reconcilers=reconcilers)
+
+Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df,
+ Y_df=Y_fitted_df,
+ S_df=S_df, tags=tags)
+Y_rec_df.groupby('unique_id').head(2)
+```
+
+| | unique_id | ds | Naive | Naive/BottomUp |
+|-----|---------------------|------------|-------|----------------|
+| 0 | Australia | 2000-05-01 | 880.0 | 880.0 |
+| 1 | Australia | 2000-06-01 | 880.0 | 880.0 |
+| 4 | Australia/State1 | 2000-05-01 | 80.0 | 80.0 |
+| 5 | Australia/State1 | 2000-06-01 | 80.0 | 80.0 |
+| 8 | Australia/State2 | 2000-05-01 | 800.0 | 800.0 |
+| 9 | Australia/State2 | 2000-06-01 | 800.0 | 800.0 |
+| 12 | Australia/State1/r1 | 2000-05-01 | 40.0 | 40.0 |
+| 13 | Australia/State1/r1 | 2000-06-01 | 40.0 | 40.0 |
+| 16 | Australia/State1/r2 | 2000-05-01 | 40.0 | 40.0 |
+| 17 | Australia/State1/r2 | 2000-06-01 | 40.0 | 40.0 |
+| 20 | Australia/State2/r3 | 2000-05-01 | 400.0 | 400.0 |
+| 21 | Australia/State2/r3 | 2000-06-01 | 400.0 | 400.0 |
+| 24 | Australia/State2/r4 | 2000-05-01 | 400.0 | 400.0 |
+| 25 | Australia/State2/r4 | 2000-06-01 | 400.0 | 400.0 |
+
+## References
+
+- [Hyndman, R.J., & Athanasopoulos, G. (2021). “Forecasting:
+ principles and practice, 3rd edition: Chapter 11: Forecasting
+ hierarchical and grouped series.”. OTexts: Melbourne, Australia.
+ OTexts.com/fpp3 Accessed on July
+ 2022.](https://otexts.com/fpp3/hierarchical.html)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+The architecture consists of an encoder-decoder structure with multiple
+layers, each with residual connections and layer normalization. Finally,
+a linear layer maps the decoder’s output to the forecasting window
+dimension. The general intuition is that attention-based mechanisms are
+able to capture the diversity of past events and correctly extrapolate
+potential future distributions.
+
+To make prediction, TimeGPT “reads” the input series much like the way
+humans read a sentence – from left to right. It looks at windows of past
+data, which we can think of as “tokens”, and predicts what comes next.
+This prediction is based on patterns the model identifies in past data
+and extrapolates into the future.
+
+## Explore examples and use cases
+
+Visit our comprehensive documentation to explore a wide range of
+examples and practical use cases for TimeGPT. Whether you’re getting
+started with our [Quickstart
+Guide](https://docs.nixtla.io/docs/getting-started-timegpt_quickstart),
+[setting up your API
+key](https://docs.nixtla.io/docs/getting-started-setting_up_your_api_key),
+or looking for advanced forecasting techniques, our resources are
+designed to guide you through every step of the process.
+
+Learn how to handle [anomaly
+detection](https://docs.nixtla.io/docs/capabilities-anomaly-detection-quickstart),
+[fine-tune
+models](https://docs.nixtla.io/docs/tutorials-fine_tuning_with_a_specific_loss_function)
+with specific loss functions, and scale your computing using frameworks
+like [Spark](https://docs.nixtla.io/docs/tutorials-spark),
+[Dask](https://docs.nixtla.io/docs/tutorials-dask), and
+[Ray](https://docs.nixtla.io/docs/tutorials-ray).
+
+Additionally, our documentation covers specialized topics such as
+handling [exogenous
+variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables),
+validating models through
+[cross-validation](https://docs.nixtla.io/docs/tutorials-cross_validation),
+and forecasting under uncertainty with [quantile
+forecasts](https://docs.nixtla.io/docs/tutorials-quantile_forecasts) and
+[prediction
+intervals](https://docs.nixtla.io/docs/tutorials-prediction_intervals).
+
+For those interested in real-world applications, discover how TimeGPT
+can be used for [forecasting web
+traffic](https://docs.nixtla.io/docs/use-cases-forecasting_web_traffic)
+or [predicting Bitcoin
+prices](https://docs.nixtla.io/docs/use-cases-bitcoin_price_prediction).
+
diff --git a/nixtla/docs/getting-started/polars_quickstart.html.mdx b/nixtla/docs/getting-started/polars_quickstart.html.mdx
new file mode 100644
index 00000000..70a127b7
--- /dev/null
+++ b/nixtla/docs/getting-started/polars_quickstart.html.mdx
@@ -0,0 +1,207 @@
+---
+description: >-
+ TimeGPT is a production ready, generative pretrained transformer for time
+ series. It's capable of accurately predicting various domains such as retail,
+ electricity, finance, and IoT with just a few lines of code 🚀.
+output-file: polars_quickstart.html
+title: TimeGPT Quickstart (Polars)
+---
+
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/21_polars_quickstart.ipynb)
+
+## Step 1: Create a TimeGPT account and generate your API key
+
+- Go to [dashboard.nixtla.io](https://dashboard.nixtla.io/)
+- Sign in with Google, GitHub or your email
+- Create your API key by going to ‘API Keys’ in the menu and clicking
+ on ‘Create New API Key’
+- Your new key will appear. Copy the API key using the button on the
+ right.
+
+
+
+## Step 2: Install Nixtla
+
+In your favorite Python development environment:
+
+Install `nixtla` with `pip`:
+
+```shell
+pip install nixtla
+```
+
+## Step 3: Import the Nixtla TimeGPT client
+
+```python
+from nixtla import NixtlaClient
+```
+
+You can instantiate the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class providing your authentication API key.
+
+```python
+nixtla_client = NixtlaClient(
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+Check your API key status with the `validate_api_key` method.
+
+```python
+nixtla_client.validate_api_key()
+```
+
+``` text
+True
+```
+
+**This will get you started, but for more secure usage, see [Setting Up
+your API
+Key](https://docs.nixtla.io/docs/getting-started-setting_up_your_api_key).**
+
+## Step 4: Start making forecasts!
+
+Now you can start making forecasts! Let’s import an example using the
+classic `AirPassengers` dataset. This dataset contains the monthly
+number of airline passengers in Australia between 1949 and 1960. First,
+load the dataset and plot it:
+
+```python
+import polars as pl
+```
+
+
+```python
+df = pl.read_csv(
+ 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv',
+ try_parse_dates=True,
+)
+df.head()
+```
+
+| timestamp | value |
+|------------|-------|
+| date | i64 |
+| 1949-01-01 | 112 |
+| 1949-02-01 | 118 |
+| 1949-03-01 | 132 |
+| 1949-04-01 | 129 |
+| 1949-05-01 | 121 |
+
+```python
+nixtla_client.plot(df, time_col='timestamp', target_col='value')
+```
+
+
+
+> 📘 Data Requirements
+>
+> - Make sure the target variable column does not have missing or
+> non-numeric values.
+> - Do not include gaps/jumps in the timestamps (for the given
+> frequency) between the first and late timestamps. The forecast
+> function will not impute missing dates.
+> - The time column should be of type
+> [Date](https://docs.pola.rs/api/python/stable/reference/api/polars.datatypes.Date.html)
+> or
+> [Datetime](https://docs.pola.rs/api/python/stable/reference/api/polars.datatypes.Datetime.html).
+>
+> For further details go to [Data
+> Requeriments](https://docs.nixtla.io/docs/getting-started-data_requirements).
+
+### Forecast a longer horizon into the future
+
+Next, forecast the next 12 months using the SDK `forecast` method. Set
+the following parameters:
+
+- `df`: A pandas DataFrame containing the time series data.
+- `h`: Horizons is the number of steps ahead to forecast.
+- `freq`: The polars offset alias, see the possible values
+ [here](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.offset_by.html).
+- `time_col`: The column that identifies the datestamp.
+- `target_col`: The variable to forecast.
+
+```python
+timegpt_fcst_df = nixtla_client.forecast(df=df, h=12, freq='1mo', time_col='timestamp', target_col='value')
+timegpt_fcst_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| timestamp | TimeGPT |
+|------------|------------|
+| date | f64 |
+| 1961-01-01 | 437.837921 |
+| 1961-02-01 | 426.062714 |
+| 1961-03-01 | 463.116547 |
+| 1961-04-01 | 478.244507 |
+| 1961-05-01 | 505.646484 |
+
+```python
+nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
+```
+
+
+
+You can also produce longer forecasts by increasing the horizon
+parameter and selecting the `timegpt-1-long-horizon` model. Use this
+model if you want to predict more than one seasonal period of your data.
+
+For example, let’s forecast the next 36 months:
+
+```python
+timegpt_fcst_df = nixtla_client.forecast(df=df, h=36, time_col='timestamp', target_col='value', freq='1mo', model='timegpt-1-long-horizon')
+timegpt_fcst_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Querying model metadata...
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| timestamp | TimeGPT |
+|------------|------------|
+| date | f64 |
+| 1961-01-01 | 436.843414 |
+| 1961-02-01 | 419.351532 |
+| 1961-03-01 | 458.943146 |
+| 1961-04-01 | 477.876068 |
+| 1961-05-01 | 505.656921 |
+
+```python
+nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
+```
+
+
+
+### Produce a shorter forecast
+
+You can also produce a shorter forecast. For this, we recommend using
+the default model, `timegpt-1`.
+
+```python
+timegpt_fcst_df = nixtla_client.forecast(df=df, h=6, time_col='timestamp', target_col='value', freq='1mo')
+nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+
+
diff --git a/nixtla/docs/getting-started/pricing.html.mdx b/nixtla/docs/getting-started/pricing.html.mdx
new file mode 100644
index 00000000..8a3b00de
--- /dev/null
+++ b/nixtla/docs/getting-started/pricing.html.mdx
@@ -0,0 +1,29 @@
+---
+output-file: pricing.html
+title: Subscription Plans
+---
+
+
+We offer various Enterprise plans tailored to your forecasting needs.
+The number of API calls, number of users, and support levels can be
+customized based on your needs. We also offer an option for a
+self-hosted version and a version hosted on Azure.
+
+Please get in touch with us at `support@nixtla.io` for more information
+regarding pricing options and to discuss your specific requirements. For
+organizations interested in exploring our solution further, you can
+schedule a demo
+[here](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=dc037f5a-d93b-4%5B…%5D90b-a611dd9460af&utm_source=github&utm_medium=pricing_page).
+
+**Free trial available**
+
+When you [create your account](https://dashboard.nixtla.io), you’ll
+receive a 30-day free trial, no credit card required. After 30 days,
+access will expire unless you upgrade to a paid plan. Contact us to
+continue leveraging TimeGPT for accurate and easy to use forecasting!
+
+**More information on pricing and billing**
+
+For additional information on pricing and billing please see [our
+FAQ](https://docs.nixtla.io/docs/getting-started-faq#pricing-and-billing).
+
diff --git a/nixtla/docs/getting-started/quickstart.html.mdx b/nixtla/docs/getting-started/quickstart.html.mdx
new file mode 100644
index 00000000..09c33c86
--- /dev/null
+++ b/nixtla/docs/getting-started/quickstart.html.mdx
@@ -0,0 +1,215 @@
+---
+description: >-
+ TimeGPT is a production ready, generative pretrained transformer for time
+ series. It's capable of accurately predicting various domains such as retail,
+ electricity, finance, and IoT with just a few lines of code 🚀.
+output-file: quickstart.html
+title: TimeGPT Quickstart
+---
+
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/2_quickstart.ipynb)
+
+## Step 1: Create a TimeGPT account and generate your API key
+
+- Go to [dashboard.nixtla.io](https://dashboard.nixtla.io) to activate
+ your free trial and set up an account.
+- Sign in with Google, GitHub or your email
+- Create your API key by going to ‘API Keys’ in the menu and clicking
+ on ‘Create New API Key’
+- Your new key will appear. Copy the API key using the button on the
+ right.
+
+
+
+## Step 2: Install Nixtla
+
+In your favorite Python development environment:
+
+Install `nixtla` with `pip`:
+
+```shell
+pip install nixtla
+```
+
+## Step 3: Import the Nixtla TimeGPT client
+
+```python
+from nixtla import NixtlaClient
+```
+
+You can instantiate the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class providing your authentication API key.
+
+```python
+nixtla_client = NixtlaClient(
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+Check your API key status with the `validate_api_key` method.
+
+```python
+nixtla_client.validate_api_key()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Happy Forecasting! :), If you have questions or need support, please email support@nixtla.io
+```
+
+``` text
+True
+```
+
+**This will get you started, but for more secure usage, see [Setting Up
+your API
+Key](https://docs.nixtla.io/docs/getting-started-setting_up_your_api_key).**
+
+## Step 4: Start making forecasts!
+
+Now you can start making forecasts! Let’s import an example using the
+classic `AirPassengers` dataset. This dataset contains the monthly
+number of airline passengers in Australia between 1949 and 1960. First,
+load the dataset and plot it:
+
+```python
+import pandas as pd
+```
+
+
+```python
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
+df.head()
+```
+
+| | timestamp | value |
+|-----|------------|-------|
+| 0 | 1949-01-01 | 112 |
+| 1 | 1949-02-01 | 118 |
+| 2 | 1949-03-01 | 132 |
+| 3 | 1949-04-01 | 129 |
+| 4 | 1949-05-01 | 121 |
+
+```python
+nixtla_client.plot(df, time_col='timestamp', target_col='value')
+```
+
+
+
+> 📘 Data Requirements
+>
+> - Make sure the target variable column does not have missing or
+> non-numeric values.
+> - Do not include gaps/jumps in the datestamps (for the given
+> frequency) between the first and late datestamps. The forecast
+> function will not impute missing dates.
+> - The format of the datestamp column should be readable by Pandas
+> (see [this
+> link](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)
+> for more details).
+>
+> For further details go to [Data
+> Requirements](https://docs.nixtla.io/docs/getting-started-data_requirements).
+
+> 👍 Save figures made with TimeGPT
+>
+> The `plot` method automatically displays figures when in a notebook
+> environment. To save figures locally, you can do:
+>
+> `fig = nixtla_client.plot(df, time_col='timestamp', target_col='value')`
+>
+> `fig.savefig('plot.png', bbox_inches='tight')`
+
+### Forecast a longer horizon into the future
+
+Next, forecast the next 12 months using the SDK `forecast` method. Set
+the following parameters:
+
+- `df`: A pandas DataFrame containing the time series data.
+- `h`: Horizons is the number of steps ahead to forecast.
+- `freq`: The frequency of the time series in Pandas format. See
+ [pandas’ available
+ frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).
+ (If you don’t provide any frequency, the SDK will try to infer it)
+- `time_col`: The column that identifies the datestamp.
+- `target_col`: The variable to forecast.
+
+```python
+timegpt_fcst_df = nixtla_client.forecast(df=df, h=12, freq='MS', time_col='timestamp', target_col='value')
+timegpt_fcst_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| | timestamp | TimeGPT |
+|-----|------------|------------|
+| 0 | 1961-01-01 | 437.837921 |
+| 1 | 1961-02-01 | 426.062714 |
+| 2 | 1961-03-01 | 463.116547 |
+| 3 | 1961-04-01 | 478.244507 |
+| 4 | 1961-05-01 | 505.646484 |
+
+```python
+nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
+```
+
+
+
+You can also produce longer forecasts by increasing the horizon
+parameter and selecting the `timegpt-1-long-horizon` model. Use this
+model if you want to predict more than one seasonal period of your data.
+
+For example, let’s forecast the next 36 months:
+
+```python
+timegpt_fcst_df = nixtla_client.forecast(df=df, h=36, time_col='timestamp', target_col='value', freq='MS', model='timegpt-1-long-horizon')
+timegpt_fcst_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| | timestamp | TimeGPT |
+|-----|------------|------------|
+| 0 | 1961-01-01 | 436.843414 |
+| 1 | 1961-02-01 | 419.351532 |
+| 2 | 1961-03-01 | 458.943146 |
+| 3 | 1961-04-01 | 477.876068 |
+| 4 | 1961-05-01 | 505.656921 |
+
+```python
+nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
+```
+
+
+
+### Produce a shorter forecast
+
+You can also produce a shorter forecast. For this, we recommend using
+the default model, `timegpt-1`.
+
+```python
+timegpt_fcst_df = nixtla_client.forecast(df=df, h=6, time_col='timestamp', target_col='value', freq='MS')
+nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+
+
diff --git a/nixtla/docs/getting-started/setting_up_your_api_key.html.mdx b/nixtla/docs/getting-started/setting_up_your_api_key.html.mdx
new file mode 100644
index 00000000..44fbb73f
--- /dev/null
+++ b/nixtla/docs/getting-started/setting_up_your_api_key.html.mdx
@@ -0,0 +1,125 @@
+---
+output-file: setting_up_your_api_key.html
+title: Setting up your API key
+---
+
+
+This tutorial will explain how to set up your API key when using the
+Nixtla SDK. To create an `Api Key` go to your
+[Dashboard](https://dashboard.nixtla.io/).
+
+There are different ways to set up your API key. We provide some
+examples below. A scematic is given below.
+
+
+
+## 1. Copy and paste your key directly into your Python code
+
+This approach is straightforward and best for quick tests or scripts
+that won’t be shared.
+
+- **Step 1**: Copy the API key found in the `API Keys` of your [Nixtla
+ dashboard](https://dashboard.nixtla.io/).
+- **Step 2**: Paste the key directly into your Python code, by
+ instantiating the
+ [`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+ with your API key:
+
+```python
+from nixtla import NixtlaClient
+nixtla_client = NixtlaClient(api_key ='your API key here')
+```
+
+> **Important**
+>
+> This approach is considered unsecure, as your API key will be part of
+> your source code.
+
+## 2. Secure: using an environment variable
+
+- **Step 1**: Store your API key in an environment variable named
+ `NIXTLA_API_KEY`. This can be done (a) temporarily for a session
+ or (b) permanently, depending on your preference.
+- **Step 2**: When you instantiate the
+ [`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+ class, the SDK will automatically look for the `NIXTLA_API_KEY`
+ environment variable and use it to authenticate your requests.
+
+> **Important**
+>
+> The environment variable must be named exactly `NIXTLA_API_KEY`, with
+> all capital letters and no deviations in spelling, for the SDK to
+> recognize it.
+
+### a. Temporary: From the Terminal
+
+This approach is useful if you are working from a terminal, and need a
+temporary solution.
+
+#### Linux / Mac
+
+Open a terminal and use the `export` command to set `NIXTLA_API_KEY`.
+
+```bash
+export NIXTLA_API_KEY=your_api_key
+```
+
+#### Windows
+
+For Windows users, open a Powershell window and use the `Set` command to
+set `NIXTLA_API_KEY`.
+
+```powershell
+Set NIXTLA_API_KEY=your_api_key
+```
+
+### b. Permanent: Using a `.env` file
+
+For a more persistent solution place your API key in a `.env` file
+located in the folder of your Python script. In this file, include the
+following:
+
+```python
+NIXTLA_API_KEY=your_api_key
+```
+
+You can now load the environment variable within your Python script. Use
+the `dotenv` package to load the `.env` file and then instantiate the
+`NIXTLA_API_KEY` class. For example:
+
+```python
+from dotenv import load_dotenv
+load_dotenv()
+
+from nixtla import NixtlaClient
+nixtla_client = NixtlaClient()
+```
+
+This approach is more secure and suitable for applications that will be
+deployed or shared, as it keeps API keys out of the source code.
+
+> **Important**
+>
+> Remember, your API key is like a password - keep it secret, keep it
+> safe!
+
+## 3. Validate your API key
+
+You can always find your API key in the `API Keys` section of your
+dashboard. To check the status of your API key, use the
+`validate_api_key` method of the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class. This method will return `True` if the API key is valid and
+`False` otherwise.
+
+```python
+nixtla_client.validate_api_key()
+```
+
+You don’t need to validate your API key every time you use `TimeGPT`.
+This function is provided for your convenience to ensure its validity.
+For full access to `TimeGPT`’s functionalities, in addition to a valid
+API key, you also need sufficient credits in your account. You can check
+your credits in the `Usage` section of your
+[dashboard](https://dashboard.nixtla.io/).
+
diff --git a/nixtla/docs/getting-started/why_timegpt.html.mdx b/nixtla/docs/getting-started/why_timegpt.html.mdx
new file mode 100644
index 00000000..e2f4247e
--- /dev/null
+++ b/nixtla/docs/getting-started/why_timegpt.html.mdx
@@ -0,0 +1,374 @@
+---
+output-file: why_timegpt.html
+title: Why TimeGPT?
+---
+
+
+In this notebook, we compare the performance of TimeGPT against three
+forecasting models: the classical model (ARIMA), the machine learning
+model (LightGBM), and the deep learning model (N-HiTS), using a subset
+of data from the M5 Forecasting competition. We want to highlight three
+top-rated benefits our users love about TimeGPT:
+
+🎯 **Accuracy**: TimeGPT consistently outperforms traditional models by
+capturing complex patterns with precision.
+
+⚡ **Speed**: Generate forecasts faster without needing extensive
+training or tuning for each series.
+
+🚀 **Ease of Use**: Minimal setup and no complex preprocessing make
+TimeGPT accessible and ready to use right out of the box!
+
+Before diving into the notebook, please visit our
+[dashboard](https://dashboard.nixtla.io) to generate your TimeGPT
+`api_key` and give it a try yourself!
+
+# Table of Contents
+
+1. [Data Introduction](#1-data-introduction)
+2. [Model Fitting](#2-model-fitting-timegpt-arima-lightgbm-n-hits)
+ 1. [Fitting TimeGPT](#21-timegpt)
+ 2. [Fitting ARIMA](#22-classical-models-arima)
+ 3. [Fitting Light GBM](#23-machine-learning-models-lightgbm)
+ 4. [Fitting NHITS](#24-n-hits)
+3. [Results and Evaluation](#3-performance-comparison-and-results)
+4. [Conclusion](#4-conclusion)
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/7_why_timegpt.ipynb)
+
+```python
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+
+from nixtla import NixtlaClient
+from utilsforecast.plotting import plot_series
+from utilsforecast.losses import mae, rmse, smape
+from utilsforecast.evaluation import evaluate
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+## 1. Data introduction
+
+In this notebook, we’re working with an aggregated dataset from the M5
+Forecasting - Accuracy competition. This dataset includes **7 daily time
+series**, each with **1,941 data points**. The last **28 data points**
+of each series are set aside as the test set, allowing us to evaluate
+model performance on unseen data.
+
+```python
+df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/demand_example.csv', parse_dates=['ds'])
+```
+
+
+```python
+df.groupby('unique_id').agg({"ds":["min","max","count"],\
+ "y":["min","mean","median","max"]})
+```
+
+| | ds | | | y | | | |
+|-------------|------------|------------|-------|------|--------------|---------|---------|
+| | min | max | count | min | mean | median | max |
+| unique_id | | | | | | | |
+| FOODS_1 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 2674.085523 | 2665.0 | 5493.0 |
+| FOODS_2 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 4015.984029 | 3894.0 | 9069.0 |
+| FOODS_3 | 2011-01-29 | 2016-05-22 | 1941 | 10.0 | 16969.089129 | 16548.0 | 28663.0 |
+| HOBBIES_1 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 2936.122617 | 2908.0 | 5009.0 |
+| HOBBIES_2 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 279.053065 | 248.0 | 871.0 |
+| HOUSEHOLD_1 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 6039.594539 | 5984.0 | 11106.0 |
+| HOUSEHOLD_2 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 1566.840289 | 1520.0 | 2926.0 |
+
+```python
+df_train = df.query('ds <= "2016-04-24"')
+df_test = df.query('ds > "2016-04-24"')
+
+print(df_train.shape, df_test.shape)
+```
+
+``` text
+(13391, 3) (196, 3)
+```
+
+## 2. Model Fitting (TimeGPT, ARIMA, LightGBM, N-HiTS)
+
+### 2.1 TimeGPT
+
+TimeGPT offers a powerful, streamlined solution for time series
+forecasting, delivering state-of-the-art results with minimal effort.
+With TimeGPT, there’s no need for data preprocessing or feature
+engineering – simply initiate the Nixtla client and call
+`nixtla_client.forecast` to produce accurate, high-performance forecasts
+tailored to your unique time series.
+
+```python
+# Forecast with TimeGPT
+fcst_timegpt = nixtla_client.forecast(df = df_train,
+ target_col = 'y',
+ h=28, # Forecast horizon, predicts the next 28 time steps
+ model='timegpt-1-long-horizon', # Use the model for long-horizon forecasting
+ finetune_steps=10, # Number of finetuning steps
+ level = [90]) # Generate a 90% confidence interval
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: D
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+```python
+# Evaluate performance and plot forecast
+fcst_timegpt['ds'] = pd.to_datetime(fcst_timegpt['ds'])
+test_df = pd.merge(df_test, fcst_timegpt, 'left', ['unique_id', 'ds'])
+evaluation_timegpt = evaluate(test_df, metrics=[rmse, smape], models=["TimeGPT"])
+evaluation_timegpt.groupby(['metric'])['TimeGPT'].mean()
+```
+
+``` text
+metric
+rmse 592.607378
+smape 0.049403
+Name: TimeGPT, dtype: float64
+```
+
+### 2.2 Classical Models (ARIMA):
+
+Next, we applied ARIMA, a traditional statistical model, to the same
+forecasting task. Classical models use historical trends and seasonality
+to make predictions by relying on linear assumptions. However, they
+struggled to capture the complex, non-linear patterns within the data,
+leading to lower accuracy compared to other approaches. Additionally,
+ARIMA was slower due to its iterative parameter estimation process,
+which becomes computationally intensive for larger datasets.
+
+> 📘 Why Use TimeGPT over Classical Models?
+>
+> - **Complex Patterns**: TimeGPT captures non-linear trends classical
+> models miss.
+>
+> - **Minimal Preprocessing**: TimeGPT requires little to no data
+> preparation.
+>
+> - **Scalability**: TimeGPT can efficiently scales across multiple
+> series without retraining.
+
+```python
+from statsforecast import StatsForecast
+from statsforecast.models import AutoARIMA
+```
+
+
+```python
+#Initiate ARIMA model
+sf = StatsForecast(
+ models=[AutoARIMA(season_length=7)],
+ freq='D'
+)
+# Fit and forecast
+fcst_arima = sf.forecast(h=28, df=df_train)
+```
+
+
+```python
+fcst_arima.reset_index(inplace=True)
+test_df = pd.merge(df_test, fcst_arima, 'left', ['unique_id', 'ds'])
+evaluation_arima = evaluate(test_df, metrics=[rmse, smape], models=["AutoARIMA"])
+evaluation_arima.groupby(['metric'])['AutoARIMA'].mean()
+```
+
+``` text
+metric
+rmse 724.957364
+smape 0.055018
+Name: AutoARIMA, dtype: float64
+```
+
+### 2.3 Machine Learning Models (LightGBM)
+
+Thirdly, we used a machine learning model, LightGBM, for the same
+forecasting task, implemented through the automated pipeline provided by
+our mlforecast library. While LightGBM can capture seasonality and
+patterns, achieving the best performance often requires detailed feature
+engineering, careful hyperparameter tuning, and domain knowledge. You
+can try our mlforecast library to simplify this process and get started
+quickly!
+
+> 📘 Why Use TimeGPT over Machine Learning Models?
+>
+> - **Automatic Pattern Recognition**: Captures complex patterns from
+> raw data, bypassing the need for feature engineering.
+>
+> - **Minimal Tuning**: Works well without extensive tuning.
+>
+> - **Scalability**: Forecasts across multiple series without
+> retraining.
+
+```python
+import optuna
+from mlforecast.auto import AutoMLForecast, AutoLightGBM
+
+# Suppress Optuna's logging output
+optuna.logging.set_verbosity(optuna.logging.ERROR)
+```
+
+
+```python
+# Initialize an automated forecasting pipeline using AutoMLForecast.
+mlf = AutoMLForecast(
+ models=[AutoLightGBM()],
+ freq='D',
+ season_length=7,
+ fit_config=lambda trial: {'static_features': ['unique_id']}
+)
+
+# Fit the model to the training dataset.
+mlf.fit(
+ df=df_train.astype({'unique_id': 'category'}),
+ n_windows=1,
+ h=28,
+ num_samples=10,
+)
+fcst_lgbm = mlf.predict(28)
+```
+
+
+```python
+test_df = pd.merge(df_test, fcst_lgbm, 'left', ['unique_id', 'ds'])
+evaluation_lgbm = evaluate(test_df, metrics=[rmse, smape], models=["AutoLightGBM"])
+evaluation_lgbm.groupby(['metric'])['AutoLightGBM'].mean()
+```
+
+``` text
+metric
+rmse 687.773744
+smape 0.051448
+Name: AutoLightGBM, dtype: float64
+```
+
+### 2.4 N-HiTS
+
+Lastly, we used N-HiTS, a state-of-the-art deep learning model designed
+for time series forecasting. The model produced accurate results,
+demonstrating its ability to capture complex, non-linear patterns within
+the data. However, setting up and tuning N-HiTS required significantly
+more time and computational resources compared to TimeGPT.
+
+> 📘 Why Use TimeGPT Over Deep Learning Models?
+>
+> - **Faster Setup**: Quick setup and forecasting, unlike the lengthy
+> configuration and training times of neural networks.
+>
+> - **Less Tuning**: Performs well with minimal tuning and
+> preprocessing, while neural networks often need extensive
+> adjustments.
+>
+> - **Ease of Use**: Simple deployment with high accuracy, making it
+> accessible without deep technical expertise.
+
+```python
+from neuralforecast.core import NeuralForecast
+from neuralforecast.models import NHITS
+```
+
+
+```python
+# Initialize the N-HiTS model.
+models = [NHITS(h=28,
+ input_size=28,
+ max_steps=100)]
+
+# Fit the model using training data
+nf = NeuralForecast(models=models, freq='D')
+nf.fit(df=df_train)
+fcst_nhits = nf.predict()
+```
+
+
+```python
+test_df = pd.merge(df_test,fcst_nhits, 'left', ['unique_id', 'ds'])
+evaluation_nhits = evaluate(test_df, metrics=[rmse, smape], models=["NHITS"])
+evaluation_nhits.groupby(['metric'])['NHITS'].mean()
+```
+
+``` text
+metric
+rmse 605.011948
+smape 0.053446
+Name: NHITS, dtype: float64
+```
+
+## 3. Performance Comparison and Results:
+
+The performance of each model is evaluated using RMSE (Root Mean Squared
+Error) and SMAPE (Symmetric Mean Absolute Percentage Error). While RMSE
+emphasizes the models’ ability to control significant errors, SMAPE
+provides a relative performance perspective by normalizing errors as
+percentages. Below, we present a snapshot of performance across all
+groups. The results demonstrate that TimeGPT outperforms other models on
+both metrics.
+
+🌟 For a deeper dive into benchmarking, check out our benchmark
+repository. The summarized results are displayed below:
+
+#### Overall Performance Metrics
+
+| **Model** | **RMSE** | **SMAPE** |
+|-------------|-----------|-----------|
+| ARIMA | 724.9 | 5.50% |
+| LightGBM | 687.8 | 5.14% |
+| N-HiTS | 605.0 | 5.34% |
+| **TimeGPT** | **592.6** | **4.94%** |
+
+#### Breakdown for Each Time-series
+
+Followed below are the metrics for each individual time series groups.
+TimeGPT consistently delivers accurate forecasts across all time series
+groups. In many cases, it performs as well as or better than
+data-specific models, showing its versatility and reliability across
+different datasets.
+
+
+
+#### Benchmark Results
+
+For a more comprehensive dive into model accuracy and performance,
+explore our [Time Series Model
+Arena](https://github.com/Nixtla/nixtla/tree/main/experiments/foundation-time-series-arena)!
+TimeGPT continues to lead the pack with exceptional performance across
+benchmarks! 🌟
+
+
+
+## 4. Conclusion
+
+At the end of this notebook, we’ve put together a handy table to show
+you exactly where TimeGPT shines brightest compared to other forecasting
+models. ☀️ Think of it as your quick guide to choosing the best model
+for your unique project needs. We’re confident that TimeGPT will be a
+valuable tool in your forecasting journey. Don’t forget to visit our
+[dashboard](https://dashboard.nixtla.io) to generate your TimeGPT
+`api_key` and get started today! Happy forecasting, and enjoy the
+insights ahead!
+
+| Scenario | TimeGPT | Classical Models (e.g., ARIMA) | Machine Learning Models (e.g., XGB, LGBM) | Deep Learning Models (e.g., N-HITS) |
+|-----------|-------------|----------------|-----------------|----------------|
+| **Seasonal Patterns** | ✅ Performs well with minimal setup | ✅ Handles seasonality with adjustments (e.g., SARIMA) | ✅ Performs well with feature engineering | ✅ Captures seasonal patterns effectively |
+| **Non-Linear Patterns** | ✅ Excels, especially with complex non-linear patterns | ❌ Limited performance | ❌ Struggles without extensive feature engineering | ✅ Performs well with non-linear relationships |
+| **Large Dataset** | ✅ Highly scalable across many series | ❌ Slow and resource-intensive | ✅ Scalable with optimized implementations | ❌ Requires significant resources for large datasets |
+| **Small Dataset** | ✅ Performs well; requires only one data point to start | ✅ Performs well; may struggle with very sparse data | ✅ Performs adequately if enough features are extracted | ❌ May need a minimum data size to learn effectively |
+| **Preprocessing Required** | ✅ Minimal preprocessing needed | ❌ Requires scaling, log-transform, etc., to meet model assumptions | ❌ Requires extensive feature engineering for complex patterns | ❌ Needs data normalization and preprocessing |
+| **Accuracy Requirement** | ✅ Achieves high accuracy with minimal tuning | ❌ May struggle with complex accuracy requirements | ✅ Can achieve good accuracy with tuning | ✅ High accuracy possible but with significant resource use |
+| **Scalability** | ✅ Highly scalable with minimal task-specific configuration | ❌ Not easily scalable | ✅ Moderate scalability, with feature engineering and tuning per task | ❌ Limited scalability due to resource demands |
+| **Computational Resources** | ✅ Highly efficient, operates seamlessly on CPU, no GPU needed | ✅ Light to moderate, scales poorly with large datasets | ❌ Moderate, depends on feature complexity | ❌ High resource consumption, often requires GPU |
+| **Memory Requirement** | ✅ Efficient memory usage for large datasets | ✅ Moderate memory requirements | ❌ High memory usage for larger datasets or many series cases | ❌ High memory consumption for larger datasets and multiple series |
+| **Technical Requirements & Domain Knowledge** | ✅ Low; minimal technical setup and no domain expertise needed | ✅ Low to moderate; needs understanding of stationarity | ❌ Moderate to high; requires feature engineering and tuning | ❌ High; complex architecture and tuning |
+
diff --git a/nixtla/docs/reference/date_features.html.mdx b/nixtla/docs/reference/date_features.html.mdx
new file mode 100644
index 00000000..12ee07ea
--- /dev/null
+++ b/nixtla/docs/reference/date_features.html.mdx
@@ -0,0 +1,77 @@
+---
+output-file: date_features.html
+title: Date Features
+---
+
+
+------------------------------------------------------------------------
+
+source
+
+#### CountryHolidays
+
+> ``` text
+> CountryHolidays (countries:list[str])
+> ```
+
+*Given a list of countries, returns a dataframe with holidays for each
+country.*
+
+```python
+import pandas as pd
+```
+
+| | US_New Year's Day | US_Memorial Day | US_Independence Day | US_Labor Day | US_Veterans Day | US_Veterans Day (observed) | US_Thanksgiving | US_Christmas Day | US_Martin Luther King Jr. Day | US_Washington's Birthday | ... | US_Juneteenth National Independence Day (observed) | US_Christmas Day (observed) | MX_Año Nuevo | MX_Día de la Constitución | MX_Natalicio de Benito Juárez | MX_Día del Trabajo | MX_Día de la Independencia | MX_Día de la Revolución | MX_Transmisión del Poder Ejecutivo Federal | MX_Navidad |
+|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
+| 2018-09-03 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2018-09-04 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2018-09-05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2018-09-06 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2018-09-07 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+
+```python
+c_holidays = CountryHolidays(countries=['US', 'MX'])
+periods = 365 * 5
+dates = pd.date_range(end='2023-09-01', periods=periods)
+holidays_df = c_holidays(dates)
+holidays_df.head()
+```
+
+------------------------------------------------------------------------
+
+source
+
+#### SpecialDates
+
+> ``` text
+> SpecialDates (special_dates:dict[str,list[str]])
+> ```
+
+*Given a dictionary of categories and dates, returns a dataframe with
+the special dates.*
+
+```python
+special_dates = SpecialDates(
+ special_dates={
+ 'Important Dates': ['2021-02-26', '2020-02-26'],
+ 'Very Important Dates': ['2021-01-26', '2020-01-26', '2019-01-26']
+ }
+)
+periods = 365 * 5
+dates = pd.date_range(end='2023-09-01', periods=periods)
+holidays_df = special_dates(dates)
+holidays_df.head()
+```
+
+| | Important Dates | Very Important Dates |
+|------------|-----------------|----------------------|
+| 2018-09-03 | 0 | 0 |
+| 2018-09-04 | 0 | 0 |
+| 2018-09-05 | 0 | 0 |
+| 2018-09-06 | 0 | 0 |
+| 2018-09-07 | 0 | 0 |
+
diff --git a/nixtla/docs/reference/excel_addin.html.mdx b/nixtla/docs/reference/excel_addin.html.mdx
new file mode 100644
index 00000000..20861145
--- /dev/null
+++ b/nixtla/docs/reference/excel_addin.html.mdx
@@ -0,0 +1,103 @@
+---
+output-file: excel_addin.html
+title: TimeGPT Excel Add-in (Beta)
+---
+
+
+## Installation
+
+Head to the [TimeGTP excel add-in page in Microsoft
+Appsource](https://appsource.microsoft.com/en-us/product/office/WA200006429?tab=Overview)
+and click on “Get it now”
+
+## Usage
+
+> 📘 Access token required
+>
+> The TimeGPT Excel Add-in requires an access token. Get your API Key on
+> the [Nixtla Dashboard](http://dashboard.nixtla.io).
+
+## Support
+
+If you have questions or need support, please email `support@nixtla.io`.
+
+## How-to
+
+### Settings
+
+If this is your first time using Excel add-ins, find information on how
+to add Excel add-ins with your version of Excel. In the Office Add-ins
+Store, you’ll search for “TimeGPT”.
+
+Once you have installed the TimeGPT add-in, the add-in comes up in a
+sidebar task pane. \* Read through the Welcome screen. \* Click on the
+**‘Get Started’** button. \* The API URL is already set to:
+https://api.nixtla.io. \* Copy your API key from [Nixtla
+Dashboard](http://dashboard.nixtla.io). Paste it into the box that say
+**API Key, Bearer**. \* Click the gray arrow next to that box on the
+right. \* You’ll get to a screen with options for ‘Forecast’ and
+‘Anomaly Detection’.
+
+To access the settings later, click the gear icon in the top left.
+
+### Data Requirements
+
+- Put your dates in one column and your values in another.
+- Ensure your date format is recognized as a valid date by excel.
+- Ensure your values are recognized as valid number by excel.
+- All data inputs must exist in the same worksheet. The add-in does
+ not support forecasting using multiple worksheets.
+- Do not include headers
+
+Example:
+
+| dates | values |
+|:--------------|:-------|
+| 12/1/16 0:00 | 72 |
+| 12/1/16 1:00 | 65.8 |
+| 12/1/16 2:00 | 59.99 |
+| 12/1/16 3:00 | 50.69 |
+| 12/1/16 4:00 | 52.58 |
+| 12/1/16 5:00 | 65.05 |
+| 12/1/16 6:00 | 80.4 |
+| 12/1/16 7:00 | 200 |
+| 12/1/16 8:00 | 200.63 |
+| 12/1/16 9:00 | 155.47 |
+| 12/1/16 10:00 | 150.91 |
+
+#### Forecasting
+
+Once you’ve configured your token and formatted your input data then
+you’re all ready to forecast!
+
+With the add-in open, configure the forecasting settings by selecting
+the column for each input.
+
+- **Frequency** - The frequency of the data (hourly / daily / weekly /
+ monthly)
+
+- **Horizon** - The forecasting horizon. This represents the number of
+ time steps into the future that the forecast should predict.
+
+- **Dates Range** - The column and range of the timeseries timestamps.
+ Must not include header data, and should be formatted as a range,
+ e.g. A2:A145.
+
+- **Values Range** - The column and range of the timeseries values for
+ each point in time. Must not include header data, and should be
+ formatted as a range, e.g. B2:B145.
+
+When you’re ready, click **Make Prediction** to generate the predicted
+values. The add-in will generate a plot and append the forecasted data
+to the end of the column of your existing data and highlight them in
+green. So, scroll to the end of your data to see the predicted values.
+
+#### Anomaly Detection
+
+The requirements are the same as for the forecasting functionality, so
+if you already tried it you are ready to run the anomaly detection one.
+Go to the main page in the add-in and select “Anomaly Detection”, then
+choose your dates and values cell ranges and click on submit. We’ll run
+the model and mark the anomalies cells in yellow while adding a third
+column for expected values with a green background.
+
diff --git a/nixtla/docs/reference/nixtla_client.html.mdx b/nixtla/docs/reference/nixtla_client.html.mdx
new file mode 100644
index 00000000..b504f18c
--- /dev/null
+++ b/nixtla/docs/reference/nixtla_client.html.mdx
@@ -0,0 +1,378 @@
+---
+output-file: nixtla_client.html
+title: SDK Reference
+---
+
+
+------------------------------------------------------------------------
+
+source
+
+## NixtlaClient
+
+> ``` text
+> NixtlaClient (api_key:Optional[str]=None, base_url:Optional[str]=None,
+> timeout:Optional[int]=60, max_retries:int=6,
+> retry_interval:int=10, max_wait_time:int=360)
+> ```
+
+*Client to interact with the Nixtla API.*
+
+| | **Type** | **Default** | **Details** |
+|------|------------------|-------------------------|-------------------------|
+| api_key | Optional | None | The authorization api_key interacts with the Nixtla API.
+
+## How to use
+
+To learn how to use `nixtlar`, please refer to the
+[documentation](https://nixtla.github.io/nixtlar/).
+
+To view directly on CRAN, please use this
+[link](https://cloud.r-project.org/web/packages/nixtlar/index.html).
+
+> 📘 API key required
+>
+> The `nixtlar` package requires an API key. Get yours on the [Nixtla
+> Dashboard](http://dashboard.nixtla.io).
+
+## Support
+
+If you have questions or need support, please email `support@nixtla.io`.
+
diff --git a/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-11-output-1.png b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-11-output-1.png
new file mode 100644
index 00000000..605136ce
Binary files /dev/null and b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-11-output-1.png differ
diff --git a/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-13-output-1.png b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-13-output-1.png
new file mode 100644
index 00000000..f42b16f4
Binary files /dev/null and b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-13-output-1.png differ
diff --git a/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-23-output-1.png b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-23-output-1.png
new file mode 100644
index 00000000..1ff1ac09
Binary files /dev/null and b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-23-output-1.png differ
diff --git a/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-8-output-1.png b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-8-output-1.png
new file mode 100644
index 00000000..b1112758
Binary files /dev/null and b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-8-output-1.png differ
diff --git a/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-9-output-1.png b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-9-output-1.png
new file mode 100644
index 00000000..6d601b5e
Binary files /dev/null and b/nixtla/docs/tutorials/01_exogenous_variables_files/figure-markdown_strict/cell-9-output-1.png differ
diff --git a/nixtla/docs/tutorials/02_holidays_files/figure-markdown_strict/cell-11-output-1.png b/nixtla/docs/tutorials/02_holidays_files/figure-markdown_strict/cell-11-output-1.png
new file mode 100644
index 00000000..364e211c
Binary files /dev/null and b/nixtla/docs/tutorials/02_holidays_files/figure-markdown_strict/cell-11-output-1.png differ
diff --git a/nixtla/docs/tutorials/02_holidays_files/figure-markdown_strict/cell-12-output-1.png b/nixtla/docs/tutorials/02_holidays_files/figure-markdown_strict/cell-12-output-1.png
new file mode 100644
index 00000000..3aa492ca
Binary files /dev/null and b/nixtla/docs/tutorials/02_holidays_files/figure-markdown_strict/cell-12-output-1.png differ
diff --git a/nixtla/docs/tutorials/03_categorical_variables_files/figure-markdown_strict/cell-13-output-1.png b/nixtla/docs/tutorials/03_categorical_variables_files/figure-markdown_strict/cell-13-output-1.png
new file mode 100644
index 00000000..c954ff42
Binary files /dev/null and b/nixtla/docs/tutorials/03_categorical_variables_files/figure-markdown_strict/cell-13-output-1.png differ
diff --git a/nixtla/docs/tutorials/03_categorical_variables_files/figure-markdown_strict/cell-15-output-1.png b/nixtla/docs/tutorials/03_categorical_variables_files/figure-markdown_strict/cell-15-output-1.png
new file mode 100644
index 00000000..fb714504
Binary files /dev/null and b/nixtla/docs/tutorials/03_categorical_variables_files/figure-markdown_strict/cell-15-output-1.png differ
diff --git a/nixtla/docs/tutorials/04_longhorizon_files/figure-markdown_strict/cell-8-output-1.png b/nixtla/docs/tutorials/04_longhorizon_files/figure-markdown_strict/cell-8-output-1.png
new file mode 100644
index 00000000..fbdff06c
Binary files /dev/null and b/nixtla/docs/tutorials/04_longhorizon_files/figure-markdown_strict/cell-8-output-1.png differ
diff --git a/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-10-output-1.png b/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-10-output-1.png
new file mode 100644
index 00000000..6bc23c47
Binary files /dev/null and b/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-10-output-1.png differ
diff --git a/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-6-output-1.png b/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-6-output-1.png
new file mode 100644
index 00000000..788ffdb2
Binary files /dev/null and b/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-6-output-1.png differ
diff --git a/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-8-output-1.png b/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-8-output-1.png
new file mode 100644
index 00000000..79ec0018
Binary files /dev/null and b/nixtla/docs/tutorials/05_multiple_series_files/figure-markdown_strict/cell-8-output-1.png differ
diff --git a/nixtla/docs/tutorials/06_finetuning_files/figure-markdown_strict/cell-7-output-1.png b/nixtla/docs/tutorials/06_finetuning_files/figure-markdown_strict/cell-7-output-1.png
new file mode 100644
index 00000000..b3cf360e
Binary files /dev/null and b/nixtla/docs/tutorials/06_finetuning_files/figure-markdown_strict/cell-7-output-1.png differ
diff --git a/nixtla/docs/tutorials/07_loss_function_finetuning_files/figure-markdown_strict/cell-7-output-1.png b/nixtla/docs/tutorials/07_loss_function_finetuning_files/figure-markdown_strict/cell-7-output-1.png
new file mode 100644
index 00000000..58514318
Binary files /dev/null and b/nixtla/docs/tutorials/07_loss_function_finetuning_files/figure-markdown_strict/cell-7-output-1.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-1.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-1.png
new file mode 100644
index 00000000..7147f80b
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-1.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-2.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-2.png
new file mode 100644
index 00000000..8d020bc5
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-2.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-3.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-3.png
new file mode 100644
index 00000000..d6dbcf98
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-3.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-4.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-4.png
new file mode 100644
index 00000000..d3e117d8
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-4.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-5.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-5.png
new file mode 100644
index 00000000..c13e3d32
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-11-output-5.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-13-output-2.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-13-output-2.png
new file mode 100644
index 00000000..d0f59853
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-13-output-2.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-13-output-3.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-13-output-3.png
new file mode 100644
index 00000000..f207c4ca
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-13-output-3.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-2.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-2.png
new file mode 100644
index 00000000..83fc885f
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-2.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-3.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-3.png
new file mode 100644
index 00000000..4f44d572
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-3.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-1.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-1.png
new file mode 100644
index 00000000..e68cc77e
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-1.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-2.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-2.png
new file mode 100644
index 00000000..b9e0e5ad
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-2.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-3.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-3.png
new file mode 100644
index 00000000..33f7b749
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-3.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-4.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-4.png
new file mode 100644
index 00000000..bfee6116
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-4.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-5.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-5.png
new file mode 100644
index 00000000..fc6236a7
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-7-output-5.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-1.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-1.png
new file mode 100644
index 00000000..b2c72d41
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-1.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-2.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-2.png
new file mode 100644
index 00000000..1dbe580e
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-2.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-3.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-3.png
new file mode 100644
index 00000000..373ae3c4
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-3.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-4.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-4.png
new file mode 100644
index 00000000..a171c3d3
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-4.png differ
diff --git a/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-5.png b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-5.png
new file mode 100644
index 00000000..8e373834
Binary files /dev/null and b/nixtla/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-9-output-5.png differ
diff --git a/nixtla/docs/tutorials/09_historical_forecast_files/figure-markdown_strict/cell-6-output-1.png b/nixtla/docs/tutorials/09_historical_forecast_files/figure-markdown_strict/cell-6-output-1.png
new file mode 100644
index 00000000..b6fb5ba7
Binary files /dev/null and b/nixtla/docs/tutorials/09_historical_forecast_files/figure-markdown_strict/cell-6-output-1.png differ
diff --git a/nixtla/docs/tutorials/09_historical_forecast_files/figure-markdown_strict/cell-9-output-1.png b/nixtla/docs/tutorials/09_historical_forecast_files/figure-markdown_strict/cell-9-output-1.png
new file mode 100644
index 00000000..15ff6920
Binary files /dev/null and b/nixtla/docs/tutorials/09_historical_forecast_files/figure-markdown_strict/cell-9-output-1.png differ
diff --git a/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-1.png b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-1.png
new file mode 100644
index 00000000..0ff5930d
Binary files /dev/null and b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-1.png differ
diff --git a/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-2.png b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-2.png
new file mode 100644
index 00000000..c97034a9
Binary files /dev/null and b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-2.png differ
diff --git a/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-3.png b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-3.png
new file mode 100644
index 00000000..016708ed
Binary files /dev/null and b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-3.png differ
diff --git a/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-4.png b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-4.png
new file mode 100644
index 00000000..fcfac1b1
Binary files /dev/null and b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-4.png differ
diff --git a/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-5.png b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-5.png
new file mode 100644
index 00000000..4aae80bc
Binary files /dev/null and b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-11-output-5.png differ
diff --git a/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-7-output-1.png b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-7-output-1.png
new file mode 100644
index 00000000..9feacd2e
Binary files /dev/null and b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-7-output-1.png differ
diff --git a/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-9-output-1.png b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-9-output-1.png
new file mode 100644
index 00000000..23b2b046
Binary files /dev/null and b/nixtla/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts_files/figure-markdown_strict/cell-9-output-1.png differ
diff --git a/nixtla/docs/tutorials/11_uncertainty_quantification_with_prediction_intervals_files/figure-markdown_strict/cell-7-output-1.png b/nixtla/docs/tutorials/11_uncertainty_quantification_with_prediction_intervals_files/figure-markdown_strict/cell-7-output-1.png
new file mode 100644
index 00000000..5ba7a01e
Binary files /dev/null and b/nixtla/docs/tutorials/11_uncertainty_quantification_with_prediction_intervals_files/figure-markdown_strict/cell-7-output-1.png differ
diff --git a/nixtla/docs/tutorials/11_uncertainty_quantification_with_prediction_intervals_files/figure-markdown_strict/cell-9-output-1.png b/nixtla/docs/tutorials/11_uncertainty_quantification_with_prediction_intervals_files/figure-markdown_strict/cell-9-output-1.png
new file mode 100644
index 00000000..0d008df0
Binary files /dev/null and b/nixtla/docs/tutorials/11_uncertainty_quantification_with_prediction_intervals_files/figure-markdown_strict/cell-9-output-1.png differ
diff --git a/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-11-output-1.png b/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-11-output-1.png
new file mode 100644
index 00000000..2b10015d
Binary files /dev/null and b/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-11-output-1.png differ
diff --git a/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-13-output-1.png b/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-13-output-1.png
new file mode 100644
index 00000000..1a5fb5ac
Binary files /dev/null and b/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-13-output-1.png differ
diff --git a/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-7-output-1.png b/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-7-output-1.png
new file mode 100644
index 00000000..d9ca609d
Binary files /dev/null and b/nixtla/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-7-output-1.png differ
diff --git a/nixtla/docs/tutorials/14_hierarchical_forecasting_files/figure-markdown_strict/cell-12-output-1.png b/nixtla/docs/tutorials/14_hierarchical_forecasting_files/figure-markdown_strict/cell-12-output-1.png
new file mode 100644
index 00000000..76b30f49
Binary files /dev/null and b/nixtla/docs/tutorials/14_hierarchical_forecasting_files/figure-markdown_strict/cell-12-output-1.png differ
diff --git a/nixtla/docs/tutorials/14_hierarchical_forecasting_files/figure-markdown_strict/cell-16-output-1.png b/nixtla/docs/tutorials/14_hierarchical_forecasting_files/figure-markdown_strict/cell-16-output-1.png
new file mode 100644
index 00000000..118caa58
Binary files /dev/null and b/nixtla/docs/tutorials/14_hierarchical_forecasting_files/figure-markdown_strict/cell-16-output-1.png differ
diff --git a/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-10-output-1.png b/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-10-output-1.png
new file mode 100644
index 00000000..ce34bc47
Binary files /dev/null and b/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-10-output-1.png differ
diff --git a/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-11-output-1.png b/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-11-output-1.png
new file mode 100644
index 00000000..9c9fc61f
Binary files /dev/null and b/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-11-output-1.png differ
diff --git a/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-17-output-1.png b/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-17-output-1.png
new file mode 100644
index 00000000..6a65b384
Binary files /dev/null and b/nixtla/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-17-output-1.png differ
diff --git a/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-10-output-1.png b/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-10-output-1.png
new file mode 100644
index 00000000..e6a9d0e7
Binary files /dev/null and b/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-10-output-1.png differ
diff --git a/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-12-output-1.png b/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-12-output-1.png
new file mode 100644
index 00000000..95cf55c7
Binary files /dev/null and b/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-12-output-1.png differ
diff --git a/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-6-output-1.png b/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-6-output-1.png
new file mode 100644
index 00000000..bdd27162
Binary files /dev/null and b/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-6-output-1.png differ
diff --git a/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-8-output-1.png b/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-8-output-1.png
new file mode 100644
index 00000000..fb148b15
Binary files /dev/null and b/nixtla/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-8-output-1.png differ
diff --git a/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-10-output-1.png b/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-10-output-1.png
new file mode 100644
index 00000000..5aa97626
Binary files /dev/null and b/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-10-output-1.png differ
diff --git a/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-11-output-1.png b/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-11-output-1.png
new file mode 100644
index 00000000..2d4164c4
Binary files /dev/null and b/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-11-output-1.png differ
diff --git a/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-9-output-1.png b/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-9-output-1.png
new file mode 100644
index 00000000..791eda05
Binary files /dev/null and b/nixtla/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-9-output-1.png differ
diff --git a/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-11-output-1.png b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-11-output-1.png
new file mode 100644
index 00000000..39dba094
Binary files /dev/null and b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-11-output-1.png differ
diff --git a/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-14-output-1.png b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-14-output-1.png
new file mode 100644
index 00000000..dbfa4e3a
Binary files /dev/null and b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-14-output-1.png differ
diff --git a/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-17-output-1.png b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-17-output-1.png
new file mode 100644
index 00000000..d4728fd4
Binary files /dev/null and b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-17-output-1.png differ
diff --git a/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-20-output-1.png b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-20-output-1.png
new file mode 100644
index 00000000..f08c108d
Binary files /dev/null and b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-20-output-1.png differ
diff --git a/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-25-output-1.png b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-25-output-1.png
new file mode 100644
index 00000000..d305de40
Binary files /dev/null and b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-25-output-1.png differ
diff --git a/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-28-output-1.png b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-28-output-1.png
new file mode 100644
index 00000000..6c4e8dc1
Binary files /dev/null and b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-28-output-1.png differ
diff --git a/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-7-output-1.png b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-7-output-1.png
new file mode 100644
index 00000000..700a442c
Binary files /dev/null and b/nixtla/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-7-output-1.png differ
diff --git a/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-20-output-1.png b/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-20-output-1.png
new file mode 100644
index 00000000..853d865c
Binary files /dev/null and b/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-20-output-1.png differ
diff --git a/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-21-output-1.png b/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-21-output-1.png
new file mode 100644
index 00000000..ab0fff11
Binary files /dev/null and b/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-21-output-1.png differ
diff --git a/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-7-output-1.png b/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-7-output-1.png
new file mode 100644
index 00000000..1ce33ae9
Binary files /dev/null and b/nixtla/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-7-output-1.png differ
diff --git a/nixtla/docs/tutorials/anomaly_detection.html.mdx b/nixtla/docs/tutorials/anomaly_detection.html.mdx
new file mode 100644
index 00000000..ac003e99
--- /dev/null
+++ b/nixtla/docs/tutorials/anomaly_detection.html.mdx
@@ -0,0 +1,180 @@
+---
+output-file: anomaly_detection.html
+title: Anomaly detection
+---
+
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/20_anomaly_detection.ipynb)
+
+## Import packages
+
+First, we import the required packages for this tutorial and create an
+instance of
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient).
+
+```python
+import pandas as pd
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, set the `base_url` argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## Load dataset
+
+Now, let’s load the dataset for this tutorial. We use the Peyton Manning
+dataset which tracks the visits to the Wikipedia page of Peyton Mannig.
+
+```python
+df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv')
+df.head()
+```
+
+| | unique_id | ds | y |
+|-----|-----------|------------|----------|
+| 0 | 0 | 2007-12-10 | 9.590761 |
+| 1 | 0 | 2007-12-11 | 8.519590 |
+| 2 | 0 | 2007-12-12 | 8.183677 |
+| 3 | 0 | 2007-12-13 | 8.072467 |
+| 4 | 0 | 2007-12-14 | 7.893572 |
+
+```python
+nixtla_client.plot(df, max_insample_length=365)
+```
+
+
+
+## Anomaly detection
+
+We now perform anomaly detection. By default, TimeGPT uses a 99%
+confidence interval. If a point falls outisde of that interval, it is
+considered to be an anomaly.
+
+```python
+anomalies_df = nixtla_client.detect_anomalies(df, freq='D')
+anomalies_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Anomaly Detector Endpoint...
+```
+
+| | unique_id | ds | y | TimeGPT | TimeGPT-hi-99 | TimeGPT-lo-99 | anomaly |
+|-----|-----------|------------|-----------|----------|---------------|---------------|---------|
+| 0 | 0 | 2008-01-10 | 8.281724 | 8.224187 | 9.503586 | 6.944788 | False |
+| 1 | 0 | 2008-01-11 | 8.292799 | 8.151533 | 9.430932 | 6.872135 | False |
+| 2 | 0 | 2008-01-12 | 8.199189 | 8.127243 | 9.406642 | 6.847845 | False |
+| 3 | 0 | 2008-01-13 | 9.996522 | 8.917259 | 10.196658 | 7.637861 | False |
+| 4 | 0 | 2008-01-14 | 10.127071 | 9.002326 | 10.281725 | 7.722928 | False |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.detect_anomalies(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+As you can see, `False` is assigned to “normal” values, as they fall
+inside the confidence interval. A label of `True` is then assigned to
+abnormal points.
+
+We can also plot the anomalies using
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient).
+
+```python
+nixtla_client.plot(df, anomalies_df)
+```
+
+
+
+## Anomaly detection with exogenous features
+
+Previously, we performed anomaly detection without using any exogenous
+features. Now, it is possible to create features specifically for this
+scenario to inform the model in its task of anomaly detection.
+
+Here, we create date features that can be used by the model.
+
+This is done using the `date_features` argument. We can set it to `True`
+and it will generate all possible features from the given dates and
+frequency of the data. Alternatively, we can specify a list of features
+that we want. In this case, we want only features at the *month* and
+*year* level.
+
+```python
+anomalies_df_x = nixtla_client.detect_anomalies(
+ df,
+ freq='D',
+ date_features=['month', 'year'],
+ date_features_to_one_hot=True,
+)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Using the following exogenous features: ['month_1.0', 'month_2.0', 'month_3.0', 'month_4.0', 'month_5.0', 'month_6.0', 'month_7.0', 'month_8.0', 'month_9.0', 'month_10.0', 'month_11.0', 'month_12.0', 'year_2007.0', 'year_2008.0', 'year_2009.0', 'year_2010.0', 'year_2011.0', 'year_2012.0', 'year_2013.0', 'year_2014.0', 'year_2015.0', 'year_2016.0']
+INFO:nixtla.nixtla_client:Calling Anomaly Detector Endpoint...
+```
+
+Then, we can plot the detected anomalies where the model now used
+additional information from exogenous features.
+
+```python
+nixtla_client.plot(df, anomalies_df_x)
+```
+
+
+
+## Modifying the confidence intervals
+
+We can tweak the confidence intervals using the `level` argument. This
+takes any values between 0 and 100, including decimal numbers.
+
+Reducing the confidence interval resutls in more anomalies being
+detected, while increasing it will reduce the number of anomalies.
+
+Here, for example, we reduce the interval to 70%, and we will notice
+more anomalies being plotted (red dots).
+
+```python
+anomalies_df = nixtla_client.detect_anomalies(
+ df,
+ freq='D',
+ level=70
+)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Anomaly Detector Endpoint...
+```
+
+```python
+nixtla_client.plot(df, anomalies_df)
+```
+
+
+
diff --git a/nixtla/docs/tutorials/bounded_forecasts.html.mdx b/nixtla/docs/tutorials/bounded_forecasts.html.mdx
new file mode 100644
index 00000000..c4583db9
--- /dev/null
+++ b/nixtla/docs/tutorials/bounded_forecasts.html.mdx
@@ -0,0 +1,225 @@
+---
+output-file: bounded_forecasts.html
+title: Bounded forecasts
+---
+
+
+In forecasting, we often want to make sure the predictions stay within a
+certain range. For example, for predicting the sales of a product, we
+may require all forecasts to be positive. Thus, the forecasts may need
+to be bounded.
+
+With TimeGPT, you can create bounded forecasts by transforming your data
+prior to calling the forecast function.
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/13_bounded_forecasts.ipynb)
+
+## 1. Import packages
+
+First, we install and import the required packages
+
+```python
+import pandas as pd
+import numpy as np
+
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, set the `base_url` argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+We use the [annual egg
+prices](https://github.com/robjhyndman/fpp3package/tree/master/data)
+dataset from [Forecasting, Principles and
+Practices](https://otexts.com/fpp3/). We expect egg prices to be
+strictly positive, so we want to bound our forecasts to be positive.
+
+> **Note**
+>
+> You can install `pyreadr` with `pip`:
+>
+> ```shell
+> pip install pyreadr
+> ```
+
+```python
+import pyreadr
+from pathlib import Path
+
+# Download and store the dataset
+url = 'https://github.com/robjhyndman/fpp3package/raw/master/data/prices.rda'
+dst_path = str(Path.cwd().joinpath('prices.rda'))
+result = pyreadr.read_r(pyreadr.download_file(url, dst_path), dst_path)
+```
+
+
+```python
+# Perform some preprocessing
+df = result['prices'][['year', 'eggs']]
+df = df.dropna().reset_index(drop=True)
+df = df.rename(columns={'year':'ds', 'eggs':'y'})
+df['ds'] = pd.to_datetime(df['ds'], format='%Y')
+df['unique_id'] = 'eggs'
+
+df.tail(10)
+```
+
+| | ds | y | unique_id |
+|-----|------------|--------|-----------|
+| 84 | 1984-01-01 | 100.58 | eggs |
+| 85 | 1985-01-01 | 76.84 | eggs |
+| 86 | 1986-01-01 | 81.10 | eggs |
+| 87 | 1987-01-01 | 69.60 | eggs |
+| 88 | 1988-01-01 | 64.55 | eggs |
+| 89 | 1989-01-01 | 80.36 | eggs |
+| 90 | 1990-01-01 | 79.79 | eggs |
+| 91 | 1991-01-01 | 74.79 | eggs |
+| 92 | 1992-01-01 | 64.86 | eggs |
+| 93 | 1993-01-01 | 62.27 | eggs |
+
+We can have a look at how the prices have evolved in the 20th century,
+demonstrating that the price is trending down.
+
+```python
+nixtla_client.plot(df)
+```
+
+
+
+## 3. Bounded forecasts with TimeGPT
+
+First, we transform the target data. In this case, we will log-transform
+the data prior to forecasting, such that we can only forecast positive
+prices.
+
+```python
+df_transformed = df.copy()
+df_transformed['y'] = np.log(df_transformed['y'])
+```
+
+We will create forecasts for the next 10 years, and we include an 80, 90
+and 99.5 percentile of our forecast distribution.
+
+```python
+timegpt_fcst_with_transform = nixtla_client.forecast(df=df_transformed, h=10, freq='Y', level=[80, 90, 99.5])
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Inferred freq: AS-JAN
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+After having created the forecasts, we need to inverse the
+transformation that we applied earlier. With a log-transformation, this
+simply means we need to exponentiate the forecasts:
+
+```python
+cols_to_transform = [col for col in timegpt_fcst_with_transform if col not in ['unique_id', 'ds']]
+for col in cols_to_transform:
+ timegpt_fcst_with_transform[col] = np.exp(timegpt_fcst_with_transform[col])
+```
+
+Now, we can plot the forecasts. We include a number of prediction
+intervals, indicating the 80, 90 and 99.5 percentile of our forecast
+distribution.
+
+```python
+nixtla_client.plot(
+ df,
+ timegpt_fcst_with_transform,
+ level=[80, 90, 99.5],
+ max_insample_length=20
+)
+```
+
+
+
+The forecast and the prediction intervals look reasonable.
+
+Let’s compare these forecasts to the situation where we don’t apply a
+transformation. In this case, it may be possible to forecast a negative
+price.
+
+```python
+timegpt_fcst_without_transform = nixtla_client.forecast(df=df, h=10, freq='Y', level=[80, 90, 99.5])
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Inferred freq: AS-JAN
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+Indeed, we now observe prediction intervals that become negative:
+
+```python
+nixtla_client.plot(
+ df,
+ timegpt_fcst_without_transform,
+ level=[80, 90, 99.5],
+ max_insample_length=20
+)
+```
+
+
+
+For example, in 1995:
+
+```python
+timegpt_fcst_without_transform
+```
+
+| | unique_id | ds | TimeGPT | TimeGPT-lo-99.5 | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-hi-99.5 |
+|----|----|----|----|----|----|----|----|----|----|
+| 0 | eggs | 1994-01-01 | 66.859756 | 43.103240 | 46.131448 | 49.319034 | 84.400479 | 87.588065 | 90.616273 |
+| 1 | eggs | 1995-01-01 | 64.993477 | -20.924112 | -4.750041 | 12.275298 | 117.711656 | 134.736995 | 150.911066 |
+| 2 | eggs | 1996-01-01 | 66.695808 | 6.499170 | 8.291150 | 10.177444 | 123.214173 | 125.100467 | 126.892446 |
+| 3 | eggs | 1997-01-01 | 66.103325 | 17.304282 | 24.966939 | 33.032894 | 99.173756 | 107.239711 | 114.902368 |
+| 4 | eggs | 1998-01-01 | 67.906517 | 4.995371 | 12.349648 | 20.090992 | 115.722042 | 123.463386 | 130.817663 |
+| 5 | eggs | 1999-01-01 | 66.147575 | 29.162207 | 31.804460 | 34.585779 | 97.709372 | 100.490691 | 103.132943 |
+| 6 | eggs | 2000-01-01 | 66.062637 | 14.671932 | 19.305822 | 24.183601 | 107.941673 | 112.819453 | 117.453343 |
+| 7 | eggs | 2001-01-01 | 68.045769 | 3.915282 | 13.188964 | 22.950736 | 113.140802 | 122.902573 | 132.176256 |
+| 8 | eggs | 2002-01-01 | 66.718903 | -42.212631 | -30.583703 | -18.342726 | 151.780531 | 164.021508 | 175.650436 |
+| 9 | eggs | 2003-01-01 | 67.344078 | -86.239911 | -44.959745 | -1.506939 | 136.195095 | 179.647901 | 220.928067 |
+
+This demonstrates the value of the log-transformation to obtain bounded
+forecasts with TimeGPT, which allows us to obtain better calibrated
+prediction intervals.
+
+**References**
+
+- [Hyndman, Rob J., and George Athanasopoulos (2021). “Forecasting:
+ Principles and Practice (3rd Ed)”](https://otexts.com/fpp3/)
+
diff --git a/nixtla/docs/tutorials/categorical_variables.html.mdx b/nixtla/docs/tutorials/categorical_variables.html.mdx
new file mode 100644
index 00000000..5c74f441
--- /dev/null
+++ b/nixtla/docs/tutorials/categorical_variables.html.mdx
@@ -0,0 +1,373 @@
+---
+output-file: categorical_variables.html
+title: Categorical variables
+---
+
+
+Categorical variables are external factors that can influence a
+forecast. These variables take on one of a limited, fixed number of
+possible values, and induce a grouping of your observations.
+
+For example, if you’re forecasting daily product demand for a retailer,
+you could benefit from an event variable that may tell you what kind of
+event takes place on a given day, for example ‘None’, ‘Sporting’, or
+‘Cultural’.
+
+To incorporate categorical variables in TimeGPT, you’ll need to pair
+each point in your time series data with the corresponding external
+data.
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/03_categorical_variables.ipynb)
+
+## 1. Import packages
+
+First, we install and import the required packages and initialize the
+Nixtla client.
+
+```python
+import pandas as pd
+import os
+
+from nixtla import NixtlaClient
+from datasetsforecast.m5 import M5
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, remember to set also the `base_url`
+> argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load M5 data
+
+Let’s see an example on predicting sales of products of the [M5
+dataset](https://nixtlaverse.nixtla.io/datasetsforecast/m5.html). The M5
+dataset contains daily product demand (sales) for 10 retail stores in
+the US.
+
+First, we load the data using `datasetsforecast`. This returns:
+
+- `Y_df`, containing the sales (`y` column), for each unique product
+ (`unique_id` column) at every timestamp (`ds` column).
+- `X_df`, containing additional relevant information for each unique
+ product (`unique_id` column) at every timestamp (`ds` column).
+
+```python
+Y_df, X_df, _ = M5.load(directory=os.getcwd())
+Y_df['ds'] = pd.to_datetime(Y_df['ds'])
+X_df['ds'] = pd.to_datetime(X_df['ds'])
+Y_df.head(10)
+```
+
+| | unique_id | ds | y |
+|-----|------------------|------------|-----|
+| 0 | FOODS_1_001_CA_1 | 2011-01-29 | 3.0 |
+| 1 | FOODS_1_001_CA_1 | 2011-01-30 | 0.0 |
+| 2 | FOODS_1_001_CA_1 | 2011-01-31 | 0.0 |
+| 3 | FOODS_1_001_CA_1 | 2011-02-01 | 1.0 |
+| 4 | FOODS_1_001_CA_1 | 2011-02-02 | 4.0 |
+| 5 | FOODS_1_001_CA_1 | 2011-02-03 | 2.0 |
+| 6 | FOODS_1_001_CA_1 | 2011-02-04 | 0.0 |
+| 7 | FOODS_1_001_CA_1 | 2011-02-05 | 2.0 |
+| 8 | FOODS_1_001_CA_1 | 2011-02-06 | 0.0 |
+| 9 | FOODS_1_001_CA_1 | 2011-02-07 | 0.0 |
+
+For this example, we will only keep the additional relevant information
+from the column `event_type_1`. This column is a *categorical variable*
+that indicates whether an important event that might affect the sales of
+the product takes place at a certain date.
+
+```python
+X_df = X_df[['unique_id', 'ds', 'event_type_1']]
+
+X_df.head(10)
+```
+
+| | unique_id | ds | event_type_1 |
+|-----|------------------|------------|--------------|
+| 0 | FOODS_1_001_CA_1 | 2011-01-29 | nan |
+| 1 | FOODS_1_001_CA_1 | 2011-01-30 | nan |
+| 2 | FOODS_1_001_CA_1 | 2011-01-31 | nan |
+| 3 | FOODS_1_001_CA_1 | 2011-02-01 | nan |
+| 4 | FOODS_1_001_CA_1 | 2011-02-02 | nan |
+| 5 | FOODS_1_001_CA_1 | 2011-02-03 | nan |
+| 6 | FOODS_1_001_CA_1 | 2011-02-04 | nan |
+| 7 | FOODS_1_001_CA_1 | 2011-02-05 | nan |
+| 8 | FOODS_1_001_CA_1 | 2011-02-06 | Sporting |
+| 9 | FOODS_1_001_CA_1 | 2011-02-07 | nan |
+
+As you can see, on February 6th 2011, there is a Sporting event.
+
+## 3. Forecasting product demand using categorical variables
+
+We will forecast the demand for a single product only. We choose a high
+selling food product identified by `FOODS_3_090_CA_3`.
+
+```python
+product = 'FOODS_3_090_CA_3'
+Y_df_product = Y_df.query('unique_id == @product')
+X_df_product = X_df.query('unique_id == @product')
+```
+
+We merge our two dataframes to create the dataset to be used in TimeGPT.
+
+```python
+df = Y_df_product.merge(X_df_product)
+
+df.head(10)
+```
+
+| | unique_id | ds | y | event_type_1 |
+|-----|------------------|------------|-------|--------------|
+| 0 | FOODS_3_090_CA_3 | 2011-01-29 | 108.0 | nan |
+| 1 | FOODS_3_090_CA_3 | 2011-01-30 | 132.0 | nan |
+| 2 | FOODS_3_090_CA_3 | 2011-01-31 | 102.0 | nan |
+| 3 | FOODS_3_090_CA_3 | 2011-02-01 | 120.0 | nan |
+| 4 | FOODS_3_090_CA_3 | 2011-02-02 | 106.0 | nan |
+| 5 | FOODS_3_090_CA_3 | 2011-02-03 | 123.0 | nan |
+| 6 | FOODS_3_090_CA_3 | 2011-02-04 | 279.0 | nan |
+| 7 | FOODS_3_090_CA_3 | 2011-02-05 | 175.0 | nan |
+| 8 | FOODS_3_090_CA_3 | 2011-02-06 | 186.0 | Sporting |
+| 9 | FOODS_3_090_CA_3 | 2011-02-07 | 120.0 | nan |
+
+In order to use *categorical variables* with TimeGPT, it is necessary to
+numerically encode the variables. We will use *one-hot encoding* in this
+tutorial.
+
+We can one-hot encode the `event_type_1` column by using pandas built-in
+`get_dummies` functionality. After one-hot encoding the `event_type_1`
+variable, we can add it to the dataframe and remove the original column.
+
+```python
+event_type_1_ohe = pd.get_dummies(df['event_type_1'], dtype=int)
+df = pd.concat([df, event_type_1_ohe], axis=1)
+df = df.drop(columns = 'event_type_1')
+
+df.tail(10)
+```
+
+| | unique_id | ds | y | Cultural | National | Religious | Sporting | nan |
+|------|------------------|------------|-------|----------|----------|-----------|----------|-----|
+| 1959 | FOODS_3_090_CA_3 | 2016-06-10 | 140.0 | 0 | 0 | 0 | 0 | 1 |
+| 1960 | FOODS_3_090_CA_3 | 2016-06-11 | 151.0 | 0 | 0 | 0 | 0 | 1 |
+| 1961 | FOODS_3_090_CA_3 | 2016-06-12 | 87.0 | 0 | 0 | 0 | 0 | 1 |
+| 1962 | FOODS_3_090_CA_3 | 2016-06-13 | 67.0 | 0 | 0 | 0 | 0 | 1 |
+| 1963 | FOODS_3_090_CA_3 | 2016-06-14 | 50.0 | 0 | 0 | 0 | 0 | 1 |
+| 1964 | FOODS_3_090_CA_3 | 2016-06-15 | 58.0 | 0 | 0 | 0 | 0 | 1 |
+| 1965 | FOODS_3_090_CA_3 | 2016-06-16 | 116.0 | 0 | 0 | 0 | 0 | 1 |
+| 1966 | FOODS_3_090_CA_3 | 2016-06-17 | 124.0 | 0 | 0 | 0 | 0 | 1 |
+| 1967 | FOODS_3_090_CA_3 | 2016-06-18 | 167.0 | 0 | 0 | 0 | 0 | 1 |
+| 1968 | FOODS_3_090_CA_3 | 2016-06-19 | 118.0 | 0 | 0 | 0 | 1 | 0 |
+
+As you can see, we have now added 5 columns, each with a binary
+indicator (`1` or `0`) whether there is a `Cultural`, `National`,
+`Religious`, `Sporting` or no (`nan`) event on that particular day. For
+example, on June 19th 2016, there is a `Sporting` event.
+
+Let’s turn to our forecasting task. We will forecast the first 7 days of
+February 2016. This includes 7 February 2016 - the date on which [Super
+Bowl 50](https://en.wikipedia.org/wiki/Super_Bowl_50) was held. Such
+large, national events typically impact retail product sales.
+
+To use the encoded categorical variables in TimeGPT, we have to add them
+as future values. Therefore, we create a future values dataframe, that
+contains the `unique_id`, the timestamp `ds`, and the encoded
+categorical variables.
+
+Of course, we drop the target column as this is normally not available -
+this is the quantity that we seek to forecast!
+
+```python
+future_ex_vars_df = df.drop(columns = ['y'])
+future_ex_vars_df = future_ex_vars_df.query("ds >= '2016-02-01' & ds <= '2016-02-07'")
+
+future_ex_vars_df.head(10)
+```
+
+| | unique_id | ds | Cultural | National | Religious | Sporting | nan |
+|------|------------------|------------|----------|----------|-----------|----------|-----|
+| 1829 | FOODS_3_090_CA_3 | 2016-02-01 | 0 | 0 | 0 | 0 | 1 |
+| 1830 | FOODS_3_090_CA_3 | 2016-02-02 | 0 | 0 | 0 | 0 | 1 |
+| 1831 | FOODS_3_090_CA_3 | 2016-02-03 | 0 | 0 | 0 | 0 | 1 |
+| 1832 | FOODS_3_090_CA_3 | 2016-02-04 | 0 | 0 | 0 | 0 | 1 |
+| 1833 | FOODS_3_090_CA_3 | 2016-02-05 | 0 | 0 | 0 | 0 | 1 |
+| 1834 | FOODS_3_090_CA_3 | 2016-02-06 | 0 | 0 | 0 | 0 | 1 |
+| 1835 | FOODS_3_090_CA_3 | 2016-02-07 | 0 | 0 | 0 | 1 | 0 |
+
+Next, we limit our input dataframe to all but the 7 forecast days:
+
+```python
+df_train = df.query("ds < '2016-02-01'")
+
+df_train.tail(10)
+```
+
+| | unique_id | ds | y | Cultural | National | Religious | Sporting | nan |
+|------|------------------|------------|-------|----------|----------|-----------|----------|-----|
+| 1819 | FOODS_3_090_CA_3 | 2016-01-22 | 94.0 | 0 | 0 | 0 | 0 | 1 |
+| 1820 | FOODS_3_090_CA_3 | 2016-01-23 | 144.0 | 0 | 0 | 0 | 0 | 1 |
+| 1821 | FOODS_3_090_CA_3 | 2016-01-24 | 146.0 | 0 | 0 | 0 | 0 | 1 |
+| 1822 | FOODS_3_090_CA_3 | 2016-01-25 | 87.0 | 0 | 0 | 0 | 0 | 1 |
+| 1823 | FOODS_3_090_CA_3 | 2016-01-26 | 73.0 | 0 | 0 | 0 | 0 | 1 |
+| 1824 | FOODS_3_090_CA_3 | 2016-01-27 | 62.0 | 0 | 0 | 0 | 0 | 1 |
+| 1825 | FOODS_3_090_CA_3 | 2016-01-28 | 64.0 | 0 | 0 | 0 | 0 | 1 |
+| 1826 | FOODS_3_090_CA_3 | 2016-01-29 | 102.0 | 0 | 0 | 0 | 0 | 1 |
+| 1827 | FOODS_3_090_CA_3 | 2016-01-30 | 113.0 | 0 | 0 | 0 | 0 | 1 |
+| 1828 | FOODS_3_090_CA_3 | 2016-01-31 | 98.0 | 0 | 0 | 0 | 0 | 1 |
+
+Let’s call the `forecast` method, first *without* the categorical
+variables.
+
+```python
+timegpt_fcst_without_cat_vars_df = nixtla_client.forecast(df=df_train, h=7, level=[80, 90])
+timegpt_fcst_without_cat_vars_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Inferred freq: D
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| | unique_id | ds | TimeGPT | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 |
+|----|----|----|----|----|----|----|----|
+| 0 | FOODS_3_090_CA_3 | 2016-02-01 | 73.304092 | 53.449049 | 54.795078 | 91.813107 | 93.159136 |
+| 1 | FOODS_3_090_CA_3 | 2016-02-02 | 66.335518 | 47.510669 | 50.274136 | 82.396899 | 85.160367 |
+| 2 | FOODS_3_090_CA_3 | 2016-02-03 | 65.881630 | 36.218617 | 41.388896 | 90.374364 | 95.544643 |
+| 3 | FOODS_3_090_CA_3 | 2016-02-04 | 72.371864 | -26.683115 | 25.097362 | 119.646367 | 171.426844 |
+| 4 | FOODS_3_090_CA_3 | 2016-02-05 | 95.141045 | -2.084882 | 34.027078 | 156.255011 | 192.366971 |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+We plot the forecast and the last 28 days before the forecast period:
+
+```python
+nixtla_client.plot(
+ df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
+ timegpt_fcst_without_cat_vars_df,
+ max_insample_length=28,
+)
+```
+
+
+
+TimeGPT already provides a reasonable forecast, but it seems to somewhat
+underforecast the peak on the 6th of February 2016 - the day before the
+Super Bowl.
+
+Let’s call the `forecast` method again, now *with* the categorical
+variables.
+
+```python
+timegpt_fcst_with_cat_vars_df = nixtla_client.forecast(df=df_train, X_df=future_ex_vars_df, h=7, level=[80, 90])
+timegpt_fcst_with_cat_vars_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Inferred freq: D
+INFO:nixtla.nixtla_client:Using the following exogenous variables: Cultural, National, Religious, Sporting, nan
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| | unique_id | ds | TimeGPT | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 |
+|----|----|----|----|----|----|----|----|
+| 0 | FOODS_3_090_CA_3 | 2016-02-01 | 70.661271 | -0.204378 | 14.593348 | 126.729194 | 141.526919 |
+| 1 | FOODS_3_090_CA_3 | 2016-02-02 | 65.566941 | -20.394326 | 11.654239 | 119.479643 | 151.528208 |
+| 2 | FOODS_3_090_CA_3 | 2016-02-03 | 68.510010 | -33.713710 | 6.732952 | 130.287069 | 170.733731 |
+| 3 | FOODS_3_090_CA_3 | 2016-02-04 | 75.417710 | -40.974649 | 4.751767 | 146.083653 | 191.810069 |
+| 4 | FOODS_3_090_CA_3 | 2016-02-05 | 97.340302 | -57.385361 | 18.253812 | 176.426792 | 252.065965 |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+We plot the forecast and the last 28 days before the forecast period:
+
+```python
+nixtla_client.plot(
+ df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
+ timegpt_fcst_with_cat_vars_df,
+ max_insample_length=28,
+)
+```
+
+
+
+We can visually verify that the forecast is closer to the actual
+observed value, which is the result of including the categorical
+variable in our forecast.
+
+Let’s verify this conclusion by computing the [Mean Absolute
+Error](https://en.wikipedia.org/wiki/Mean_absolute_error) on the
+forecasts we created.
+
+```python
+from utilsforecast.losses import mae
+```
+
+
+```python
+# Create target dataframe
+df_target = df[['unique_id', 'ds', 'y']].query("ds >= '2016-02-01' & ds <= '2016-02-07'")
+
+# Rename forecast columns
+timegpt_fcst_without_cat_vars_df = timegpt_fcst_without_cat_vars_df.rename(columns={'TimeGPT': 'TimeGPT-without-cat-vars'})
+timegpt_fcst_with_cat_vars_df = timegpt_fcst_with_cat_vars_df.rename(columns={'TimeGPT': 'TimeGPT-with-cat-vars'})
+
+# Merge forecasts with target dataframe
+df_target = df_target.merge(timegpt_fcst_without_cat_vars_df[['unique_id', 'ds', 'TimeGPT-without-cat-vars']])
+df_target = df_target.merge(timegpt_fcst_with_cat_vars_df[['unique_id', 'ds', 'TimeGPT-with-cat-vars']])
+
+# Compute errors
+mean_absolute_errors = mae(df_target, ['TimeGPT-without-cat-vars', 'TimeGPT-with-cat-vars'])
+```
+
+
+```python
+mean_absolute_errors
+```
+
+| | unique_id | TimeGPT-without-cat-vars | TimeGPT-with-cat-vars |
+|-----|------------------|--------------------------|-----------------------|
+| 0 | FOODS_3_090_CA_3 | 24.285649 | 20.028514 |
+
+Indeed, we find that the error when using TimeGPT with the categorical
+variable is approx. 20% lower than when using TimeGPT without the
+categorical variables, indicating better performance when we include the
+categorical variable.
+
diff --git a/nixtla/docs/tutorials/computing_at_scale.html.mdx b/nixtla/docs/tutorials/computing_at_scale.html.mdx
new file mode 100644
index 00000000..8ca34e87
--- /dev/null
+++ b/nixtla/docs/tutorials/computing_at_scale.html.mdx
@@ -0,0 +1,91 @@
+---
+output-file: computing_at_scale.html
+title: Computing at scale
+---
+
+
+Handling large datasets is a common challenge in time series
+forecasting. For example, when working with retail data, you may have to
+forecast sales for thousands of products across hundreds of stores.
+Similarly, when dealing with electricity consumption data, you may need
+to predict consumption for thousands of households across various
+regions.
+
+Nixtla’s `TimeGPT` enables you to use several distributed computing
+frameworks to manage large datasets efficiently. `TimeGPT` currently
+supports `Spark`, `Dask`, and `Ray` through `Fugue`.
+
+In this notebook, we will explain how to leverage these frameworks using
+`TimeGPT`.
+
+**Outline:**
+
+1. [Getting Started](#1-getting-started)
+
+2. [Forecasting at Scale](#2-forecasting-at-scale)
+
+3. [Important Considerations](#3-important-considerations)
+
+## Getting started
+
+To use `TimeGPT` with any of the supported distributed computing
+frameworks, you first need an API Key, just as you would when not using
+any distributed computing.
+
+Upon [registration](https://dashboard.nixtla.io/), you will receive an
+email asking you to confirm your signup. After confirming, you will
+receive access to your dashboard. There, under`API Keys`, you will find
+your API Key. Next, you need to integrate your API Key into your
+development workflow with the Nixtla SDK. For guidance on how to do
+this, please refer to the [Setting Up Your Authentication Key
+tutorial](https://docs.nixtla.io/docs/getting-started-setting_up_your_api_key).
+
+## Forecasting at Scale
+
+Using `TimeGPT` with any of the supported distributed computing
+frameworks is straightforward and its usage is almost identical to the
+non-distributed case.
+
+1. Instantiate a
+ [`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+ class.
+2. Load your data as a `pandas` DataFrame.
+3. Initialize the distributed computing framework.
+ - [Spark](https://docs.nixtla.io/docs/tutorials-spark)
+ - [Dask](https://docs.nixtla.io/docs/tutorials-dask)
+ - [Ray](https://docs.nixtla.io/docs/tutorials-ray)
+4. Use any of the
+ [`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+ class methods.
+5. Stop the distributed computing framework, if necessary.
+
+These are the general steps that you will need to follow to use
+`TimeGPT` with any of the supported distributed computing frameworks.
+For a detailed explanation and a complete example, please refer to the
+guide for the specific framework linked above.
+
+> **Important**
+>
+> Parallelization in these frameworks is done along the various time
+> series within your dataset. Therefore, it is essential that your
+> dataset includes multiple time series, each with a unique id.
+
+## Important Considerations
+
+### When to Use a Distributed Computing Framework
+
+Consider using a distributed computing framework if your dataset:
+
+- Consists of millions of observations over multiple time series.
+- Is too large to fit into the memory of a single machine.
+- Would be too slow to process on a single machine.
+
+### Choosing the Right Framework
+
+When selecting a distributed computing framework, take into account your
+existing infrastructure and the skill set of your team. Although
+`TimeGPT` can be used with any of the supported frameworks with minimal
+code changes, choosing the right one should align with your specific
+needs and resources. This will ensure that you leverage the full
+potential of `TimeGPT` while handling large datasets efficiently.
+
diff --git a/nixtla/docs/tutorials/computing_at_scale_dask_distributed.html.mdx b/nixtla/docs/tutorials/computing_at_scale_dask_distributed.html.mdx
new file mode 100644
index 00000000..8cd610f3
--- /dev/null
+++ b/nixtla/docs/tutorials/computing_at_scale_dask_distributed.html.mdx
@@ -0,0 +1,170 @@
+---
+description: Run TimeGPT distributedly on top of Dask
+output-file: computing_at_scale_dask_distributed.html
+title: Dask
+---
+
+
+[Dask](https://www.dask.org/get-started) is an open source parallel
+computing library for Python. In this guide, we will explain how to use
+`TimeGPT` on top of Dask.
+
+**Outline:**
+
+1. [Installation](#installation)
+
+2. [Load Your Data](#load-your-data)
+
+3. [Import Dask](#import-dask)
+
+4. [Use TimeGPT on Dask](#use-timegpt-on-dask)
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/17_computing_at_scale_dask_distributed.ipynb)
+
+## 1. Installation
+
+Install Dask through [Fugue](https://fugue-tutorials.readthedocs.io/).
+Fugue provides an easy-to-use interface for distributed computing that
+lets users execute Python code on top of several distributed computing
+frameworks, including Dask.
+
+> **Note**
+>
+> You can install `fugue` with `pip`:
+>
+> ```shell
+> pip install fugue[dask]
+> ```
+
+If executing on a distributed `Dask` cluster, ensure that the `nixtla`
+library is installed across all the workers.
+
+## 2. Load Data
+
+You can load your data as a `pandas` DataFrame. In this tutorial, we
+will use a dataset that contains hourly electricity prices from
+different markets.
+
+```python
+import pandas as pd
+```
+
+
+```python
+df = pd.read_csv(
+ 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
+ parse_dates=['ds'],
+)
+df.head()
+```
+
+| | unique_id | ds | y |
+|-----|-----------|---------------------|-------|
+| 0 | BE | 2016-10-22 00:00:00 | 70.00 |
+| 1 | BE | 2016-10-22 01:00:00 | 37.10 |
+| 2 | BE | 2016-10-22 02:00:00 | 37.10 |
+| 3 | BE | 2016-10-22 03:00:00 | 44.75 |
+| 4 | BE | 2016-10-22 04:00:00 | 37.10 |
+
+## 3. Import Dask
+
+Import Dask and convert the `pandas` DataFrame to a Dask DataFrame.
+
+```python
+import dask.dataframe as dd
+```
+
+
+```python
+dask_df = dd.from_pandas(df, npartitions=2)
+dask_df
+```
+
+| | unique_id | ds | y |
+|---------------|-----------|--------|---------|
+| npartitions=2 | | | |
+| 0 | string | string | float64 |
+| 4200 | ... | ... | ... |
+| 8399 | ... | ... | ... |
+
+## 4. Use TimeGPT on Dask
+
+Using `TimeGPT` on top of `Dask` is almost identical to the
+non-distributed case. The only difference is that you need to use a
+`Dask` DataFrame, which we already defined in the previous step.
+
+First, instantiate the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class.
+
+```python
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, set the `base_url` argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+Then use any method from the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class such as
+[`forecast`](https://docs.nixtla.io/docs/reference-sdk_reference#nixtlaclientforecast)
+or
+[`cross_validation`](https://docs.nixtla.io/docs/reference-sdk_reference#nixtlaclientcross_validation).
+
+```python
+fcst_df = nixtla_client.forecast(dask_df, h=12)
+fcst_df.compute().head()
+```
+
+| | unique_id | ds | TimeGPT |
+|-----|-----------|---------------------|-----------|
+| 0 | BE | 2016-12-31 00:00:00 | 45.190453 |
+| 1 | BE | 2016-12-31 01:00:00 | 43.244446 |
+| 2 | BE | 2016-12-31 02:00:00 | 41.958389 |
+| 3 | BE | 2016-12-31 03:00:00 | 39.796486 |
+| 4 | BE | 2016-12-31 04:00:00 | 39.204533 |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+cv_df = nixtla_client.cross_validation(dask_df, h=12, n_windows=5, step_size=2)
+cv_df.compute().head()
+```
+
+| | unique_id | ds | cutoff | TimeGPT |
+|-----|-----------|---------------------|---------------------|-----------|
+| 0 | BE | 2016-12-30 04:00:00 | 2016-12-30 03:00:00 | 39.375439 |
+| 1 | BE | 2016-12-30 05:00:00 | 2016-12-30 03:00:00 | 40.039215 |
+| 2 | BE | 2016-12-30 06:00:00 | 2016-12-30 03:00:00 | 43.455849 |
+| 3 | BE | 2016-12-30 07:00:00 | 2016-12-30 03:00:00 | 47.716408 |
+| 4 | BE | 2016-12-30 08:00:00 | 2016-12-30 03:00:00 | 50.31665 |
+
+You can also use exogenous variables with `TimeGPT` on top of `Dask`. To
+do this, please refer to the [Exogenous
+Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables)
+tutorial. Just keep in mind that instead of using a pandas DataFrame,
+you need to use a `Dask` DataFrame instead.
+
diff --git a/nixtla/docs/tutorials/computing_at_scale_ray_distributed.html.mdx b/nixtla/docs/tutorials/computing_at_scale_ray_distributed.html.mdx
new file mode 100644
index 00000000..a8e84886
--- /dev/null
+++ b/nixtla/docs/tutorials/computing_at_scale_ray_distributed.html.mdx
@@ -0,0 +1,213 @@
+---
+description: Run TimeGPT distributedly on top of Ray
+output-file: computing_at_scale_ray_distributed.html
+title: Ray
+---
+
+
+[Ray](https://www.ray.io/) is an open source unified compute framework
+to scale Python workloads. In this guide, we will explain how to use
+`TimeGPT` on top of Ray.
+
+**Outline:**
+
+1. [Installation](#installation)
+
+2. [Load Your Data](#load-your-data)
+
+3. [Initialize Ray](#initialize-ray)
+
+4. [Use TimeGPT on Ray](#use-timegpt-on-ray)
+
+5. [Shutdown Ray](#shutdown-ray)
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/19_computing_at_scale_ray_distributed.ipynb)
+
+## 1. Installation
+
+Install Ray through [Fugue](https://fugue-tutorials.readthedocs.io/).
+Fugue provides an easy-to-use interface for distributed computing that
+lets users execute Python code on top of several distributed computing
+frameworks, including Ray.
+
+> **Note**
+>
+> You can install `fugue` with `pip`:
+>
+> ```shell
+> pip install fugue[ray]
+> ```
+
+If executing on a distributed `Ray` cluster, ensure that the `nixtla`
+library is installed across all the workers.
+
+## 2. Load Data
+
+You can load your data as a `pandas` DataFrame. In this tutorial, we
+will use a dataset that contains hourly electricity prices from
+different markets.
+
+```python
+import pandas as pd
+```
+
+
+```python
+df = pd.read_csv(
+ 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
+ parse_dates=['ds'],
+)
+df.head()
+```
+
+| | unique_id | ds | y |
+|-----|-----------|---------------------|-------|
+| 0 | BE | 2016-10-22 00:00:00 | 70.00 |
+| 1 | BE | 2016-10-22 01:00:00 | 37.10 |
+| 2 | BE | 2016-10-22 02:00:00 | 37.10 |
+| 3 | BE | 2016-10-22 03:00:00 | 44.75 |
+| 4 | BE | 2016-10-22 04:00:00 | 37.10 |
+
+## 3. Initialize Ray
+
+Initialize `Ray` and convert the pandas DataFrame to a `Ray` DataFrame.
+
+```python
+import ray
+from ray.cluster_utils import Cluster
+```
+
+
+```python
+ray_cluster = Cluster(
+ initialize_head=True,
+ head_node_args={"num_cpus": 2}
+)
+ray.init(address=ray_cluster.address, ignore_reinit_error=True)
+```
+
+
+```python
+ray_df = ray.data.from_pandas(df)
+ray_df
+```
+
+``` text
+MaterializedDataset(
+ num_blocks=1,
+ num_rows=6720,
+ schema={unique_id: object, ds: datetime64[ns], y: float64}
+)
+```
+
+## 4. Use TimeGPT on Ray
+
+Using `TimeGPT` on top of `Ray` is almost identical to the
+non-distributed case. The only difference is that you need to use a
+`Ray` DataFrame.
+
+First, instantiate the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class.
+
+```python
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, set the `base_url` argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+Then use any method from the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class such as
+[`forecast`](https://docs.nixtla.io/docs/reference-sdk_reference#nixtlaclientforecast)
+or
+[`cross_validation`](https://docs.nixtla.io/docs/reference-sdk_reference#nixtlaclientcross_validation).
+
+```python
+ray_df
+```
+
+``` text
+MaterializedDataset(
+ num_blocks=1,
+ num_rows=6720,
+ schema={unique_id: object, ds: datetime64[ns], y: float64}
+)
+```
+
+```python
+fcst_df = nixtla_client.forecast(ray_df, h=12)
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+To visualize the result, use the `to_pandas` method to convert the
+output of `Ray` to a `pandas` DataFrame.
+
+```python
+fcst_df.to_pandas().tail()
+```
+
+| | unique_id | ds | TimeGPT |
+|-----|-----------|---------------------|-----------|
+| 55 | NP | 2018-12-24 07:00:00 | 55.387066 |
+| 56 | NP | 2018-12-24 08:00:00 | 56.115517 |
+| 57 | NP | 2018-12-24 09:00:00 | 56.090714 |
+| 58 | NP | 2018-12-24 10:00:00 | 55.813717 |
+| 59 | NP | 2018-12-24 11:00:00 | 55.528519 |
+
+```python
+cv_df = nixtla_client.cross_validation(ray_df, h=12, freq='H', n_windows=5, step_size=2)
+```
+
+
+```python
+cv_df.to_pandas().tail()
+```
+
+| | unique_id | ds | cutoff | TimeGPT |
+|-----|-----------|---------------------|---------------------|-----------|
+| 295 | NP | 2018-12-23 19:00:00 | 2018-12-23 11:00:00 | 53.632019 |
+| 296 | NP | 2018-12-23 20:00:00 | 2018-12-23 11:00:00 | 52.512775 |
+| 297 | NP | 2018-12-23 21:00:00 | 2018-12-23 11:00:00 | 51.894035 |
+| 298 | NP | 2018-12-23 22:00:00 | 2018-12-23 11:00:00 | 51.06572 |
+| 299 | NP | 2018-12-23 23:00:00 | 2018-12-23 11:00:00 | 50.32592 |
+
+You can also use exogenous variables with `TimeGPT` on top of `Ray`. To
+do this, please refer to the [Exogenous
+Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables)
+tutorial. Just keep in mind that instead of using a pandas DataFrame,
+you need to use a `Ray` DataFrame instead.
+
+## 5. Shutdown Ray
+
+When you are done, shutdown the `Ray` session.
+
+```python
+ray.shutdown()
+```
+
diff --git a/nixtla/docs/tutorials/computing_at_scale_spark_distributed.html.mdx b/nixtla/docs/tutorials/computing_at_scale_spark_distributed.html.mdx
new file mode 100644
index 00000000..d84dcfbe
--- /dev/null
+++ b/nixtla/docs/tutorials/computing_at_scale_spark_distributed.html.mdx
@@ -0,0 +1,163 @@
+---
+description: Run TimeGPT distributedly on top of Spark
+output-file: computing_at_scale_spark_distributed.html
+title: Spark
+---
+
+
+[Spark](https://spark.apache.org/) is an open-source distributed
+computing framework designed for large-scale data processing. In this
+guide, we will explain how to use `TimeGPT` on top of Spark.
+
+**Outline:**
+
+1. [Installation](#installation)
+
+2. [Load Your Data](#load-your-data)
+
+3. [Initialize Spark](#initialize-spark)
+
+4. [Use TimeGPT on Spark](#use-timegpt-on-spark)
+
+5. [Stop Spark](#stop-spark)
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/16_computing_at_scale_spark_distributed.ipynb)
+
+## 1. Installation
+
+Install Spark through [Fugue](https://fugue-tutorials.readthedocs.io/).
+Fugue provides an easy-to-use interface for distributed computing that
+lets users execute Python code on top of several distributed computing
+frameworks, including Spark.
+
+> **Note**
+>
+> You can install `fugue` with `pip`:
+>
+> ```shell
+> pip install fugue[spark]
+> ```
+
+If executing on a distributed `Spark` cluster, ensure that the `nixtla`
+library is installed across all the workers.
+
+## 2. Load Data
+
+You can load your data as a `pandas` DataFrame. In this tutorial, we
+will use a dataset that contains hourly electricity prices from
+different markets.
+
+```python
+import pandas as pd
+```
+
+
+```python
+df = pd.read_csv(
+ 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
+ parse_dates=['ds'],
+)
+df.head()
+```
+
+| | unique_id | ds | y |
+|-----|-----------|---------------------|-------|
+| 0 | BE | 2016-10-22 00:00:00 | 70.00 |
+| 1 | BE | 2016-10-22 01:00:00 | 37.10 |
+| 2 | BE | 2016-10-22 02:00:00 | 37.10 |
+| 3 | BE | 2016-10-22 03:00:00 | 44.75 |
+| 4 | BE | 2016-10-22 04:00:00 | 37.10 |
+
+## 3. Initialize Spark
+
+Initialize `Spark` and convert the pandas DataFrame to a `Spark`
+DataFrame.
+
+```python
+from pyspark.sql import SparkSession
+```
+
+
+```python
+spark = SparkSession.builder.getOrCreate()
+```
+
+
+```python
+spark_df = spark.createDataFrame(df)
+spark_df.show(5)
+```
+
+## 4. Use TimeGPT on Spark
+
+Using `TimeGPT` on top of `Spark` is almost identical to the
+non-distributed case. The only difference is that you need to use a
+`Spark` DataFrame.
+
+First, instantiate the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class.
+
+```python
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, set the `base_url` argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+Then use any method from the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class such as
+[`forecast`](https://docs.nixtla.io/docs/reference-sdk_reference#nixtlaclientforecast)
+or
+[`cross_validation`](https://docs.nixtla.io/docs/reference-sdk_reference#nixtlaclientcross_validation).
+
+```python
+fcst_df = nixtla_client.forecast(spark_df, h=12)
+fcst_df.show(5)
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+cv_df = nixtla_client.cross_validation(spark_df, h=12, n_windows=5, step_size=2)
+cv_df.show(5)
+```
+
+You can also use exogenous variables with `TimeGPT` on top of `Spark`.
+To do this, please refer to the [Exogenous
+Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables)
+tutorial. Just keep in mind that instead of using a pandas DataFrame,
+you need to use a `Spark` DataFrame instead.
+
+## 5. Stop Spark
+
+When you are done, stop the `Spark` session.
+
+```python
+spark.stop()
+```
+
diff --git a/nixtla/docs/tutorials/cross_validation.html.mdx b/nixtla/docs/tutorials/cross_validation.html.mdx
new file mode 100644
index 00000000..8341ac3d
--- /dev/null
+++ b/nixtla/docs/tutorials/cross_validation.html.mdx
@@ -0,0 +1,402 @@
+---
+output-file: cross_validation.html
+title: Cross-validation
+---
+
+
+One of the primary challenges in time series forecasting is the inherent
+uncertainty and variability over time, making it crucial to validate the
+accuracy and reliability of the models employed. Cross-validation, a
+robust model validation technique, is particularly adapted for this
+task, as it provides insights into the expected performance of a model
+on unseen data, ensuring the forecasts are reliable and resilient before
+being deployed in real-world scenarios.
+
+`TimeGPT`, understanding the intricate needs of time series forecasting,
+incorporates the `cross_validation` method, designed to streamline the
+validation process for time series models. This functionality enables
+practitioners to rigorously test their forecasting models against
+historical data, assessing their effectiveness while tuning them for
+optimal performance. This tutorial will guide you through the nuanced
+process of conducting cross-validation within the
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient)
+class, ensuring your time series forecasting models are not just
+well-constructed, but also validated for trustworthiness and precision.
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/08_cross_validation.ipynb)
+
+## 1. Import packages
+
+First, we install and import the required packages and initialize the
+Nixtla client.
+
+We start off by initializing an instance of
+[`NixtlaClient`](https://Nixtla.github.io/nixtla/src/nixtla_client.html#nixtlaclient).
+
+```python
+import pandas as pd
+from nixtla import NixtlaClient
+
+from IPython.display import display
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, remember to set also the `base_url`
+> argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+Let’s see an example, using the Peyton Manning dataset.
+
+```python
+pm_df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv')
+```
+
+## 3. Cross-validation
+
+The `cross_validation` method within the `TimeGPT` class is an advanced
+functionality crafted to perform systematic validation on time series
+forecasting models. This method necessitates a dataframe comprising
+time-ordered data and employs a rolling-window scheme to meticulously
+evaluate the model’s performance across different time periods, thereby
+ensuring the model’s reliability and stability over time. The animation
+below shows how TimeGPT performs cross-validation.
+
+
+
+Key parameters include `freq`, which denotes the data’s frequency and is
+automatically inferred if not specified. The `id_col`, `time_col`, and
+`target_col` parameters designate the respective columns for each
+series’ identifier, time step, and target values. The method offers
+customization through parameters like `n_windows`, indicating the number
+of separate time windows on which the model is assessed, and
+`step_size`, determining the gap between these windows. If `step_size`
+is unspecified, it defaults to the forecast horizon `h`.
+
+The process also allows for model refinement via `finetune_steps`,
+specifying the number of iterations for model fine-tuning on new data.
+Data pre-processing is manageable through `clean_ex_first`, deciding
+whether to cleanse the exogenous signal prior to forecasting.
+Additionally, the method supports enhanced feature engineering from time
+data through the `date_features` parameter, which can automatically
+generate crucial date-related features or accept custom functions for
+bespoke feature creation. The `date_features_to_one_hot` parameter
+further enables the transformation of categorical date features into a
+format suitable for machine learning models.
+
+In execution, `cross_validation` assesses the model’s forecasting
+accuracy in each window, providing a robust view of the model’s
+performance variability over time and potential overfitting. This
+detailed evaluation ensures the forecasts generated are not only
+accurate but also consistent across diverse temporal contexts.
+
+```python
+timegpt_cv_df = nixtla_client.cross_validation(
+ pm_df,
+ h=7,
+ n_windows=5,
+ freq='D',
+)
+timegpt_cv_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
+```
+
+| | unique_id | ds | cutoff | y | TimeGPT |
+|-----|-----------|------------|------------|----------|----------|
+| 0 | 0 | 2015-12-17 | 2015-12-16 | 7.591862 | 7.939553 |
+| 1 | 0 | 2015-12-18 | 2015-12-16 | 7.528869 | 7.887512 |
+| 2 | 0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.766617 |
+| 3 | 0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.931502 |
+| 4 | 0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.312632 |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.cross_validation(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+cutoffs = timegpt_cv_df['cutoff'].unique()
+for cutoff in cutoffs:
+ fig = nixtla_client.plot(
+ pm_df.tail(100),
+ timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
+ )
+ display(fig)
+```
+
+
+
+
+
+
+
+
+
+
+
+## 4. Cross-validation with prediction intervals
+
+It is also possible to generate prediction intervals during
+cross-validation. To do so, we simply use the `level` argument.
+
+```python
+timegpt_cv_df = nixtla_client.cross_validation(
+ pm_df,
+ h=7,
+ n_windows=5,
+ freq='D',
+ level=[80, 90],
+)
+timegpt_cv_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
+```
+
+| | unique_id | ds | cutoff | y | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
+|----|----|----|----|----|----|----|----|----|----|
+| 0 | 0 | 2015-12-17 | 2015-12-16 | 7.591862 | 7.939553 | 8.201465 | 8.314956 | 7.677642 | 7.564151 |
+| 1 | 0 | 2015-12-18 | 2015-12-16 | 7.528869 | 7.887512 | 8.175414 | 8.207470 | 7.599609 | 7.567553 |
+| 2 | 0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.766617 | 8.267363 | 8.386674 | 7.265871 | 7.146560 |
+| 3 | 0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.931502 | 8.205929 | 8.369983 | 7.657075 | 7.493020 |
+| 4 | 0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.312632 | 9.184893 | 9.625794 | 7.440371 | 6.999469 |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.cross_validation(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+cutoffs = timegpt_cv_df['cutoff'].unique()
+for cutoff in cutoffs:
+ fig = nixtla_client.plot(
+ pm_df.tail(100),
+ timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
+ level=[80, 90],
+ models=['TimeGPT']
+ )
+ display(fig)
+```
+
+
+
+
+
+
+
+
+
+
+
+## 5. Cross-validation with exogenous variables
+
+### Time features
+
+It is possible to include exogenous variables when performing
+cross-validation. Here we use the `date_features` parameter to create
+labels for each month. These features are then used by the model to make
+predictions during cross-validation.
+
+```python
+timegpt_cv_df = nixtla_client.cross_validation(
+ pm_df,
+ h=7,
+ n_windows=5,
+ freq='D',
+ level=[80, 90],
+ date_features=['month'],
+ date_features_to_one_hot=True,
+)
+timegpt_cv_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Using the following exogenous features: ['month_1.0', 'month_2.0', 'month_3.0', 'month_4.0', 'month_5.0', 'month_6.0', 'month_7.0', 'month_8.0', 'month_9.0', 'month_10.0', 'month_11.0', 'month_12.0']
+INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
+```
+
+| | unique_id | ds | cutoff | y | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
+|----|----|----|----|----|----|----|----|----|----|
+| 0 | 0.0 | 2015-12-17 | 2015-12-16 | 7.591862 | 8.426320 | 8.721996 | 8.824101 | 8.130644 | 8.028540 |
+| 1 | 0.0 | 2015-12-18 | 2015-12-16 | 7.528869 | 8.049962 | 8.452083 | 8.658603 | 7.647842 | 7.441321 |
+| 2 | 0.0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.509098 | 7.984788 | 8.138017 | 7.033409 | 6.880180 |
+| 3 | 0.0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.739536 | 8.306914 | 8.641355 | 7.172158 | 6.837718 |
+| 4 | 0.0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.027471 | 8.722828 | 9.152306 | 7.332113 | 6.902636 |
+
+```python
+cutoffs = timegpt_cv_df['cutoff'].unique()
+for cutoff in cutoffs:
+ fig = nixtla_client.plot(
+ pm_df.tail(100),
+ timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
+ level=[80, 90],
+ models=['TimeGPT']
+ )
+ display(fig)
+```
+
+
+
+
+
+
+
+
+
+
+
+### Dynamic features
+
+Additionally you can pass dynamic exogenous variables to better inform
+`TimeGPT` about the data. You just simply have to add the exogenous
+regressors after the target column.
+
+```python
+Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity.csv')
+X_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/exogenous-vars-electricity.csv')
+df = Y_df.merge(X_df)
+```
+
+Now let’s cross validate `TimeGPT` considering this information
+
+```python
+timegpt_cv_df_x = nixtla_client.cross_validation(
+ df.groupby('unique_id').tail(100 * 48),
+ h=48,
+ n_windows=2,
+ level=[80, 90]
+)
+cutoffs = timegpt_cv_df_x.query('unique_id == "BE"')['cutoff'].unique()
+for cutoff in cutoffs:
+ fig = nixtla_client.plot(
+ df.query('unique_id == "BE"').tail(24 * 7),
+ timegpt_cv_df_x.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']),
+ models=['TimeGPT'],
+ level=[80, 90],
+ )
+ display(fig)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Using the following exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
+INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
+```
+
+
+
+
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.cross_validation(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+## 6. Cross-validation with different TimeGPT instances
+
+Also, you can generate cross validation for different instances of
+`TimeGPT` using the `model` argument. Here we use the base model and the
+model for long-horizon forecasting.
+
+```python
+timegpt_cv_df_x_long_horizon = nixtla_client.cross_validation(
+ df.groupby('unique_id').tail(100 * 48),
+ h=48,
+ n_windows=2,
+ level=[80, 90],
+ model='timegpt-1-long-horizon',
+)
+timegpt_cv_df_x_long_horizon.columns = timegpt_cv_df_x_long_horizon.columns.str.replace('TimeGPT', 'TimeGPT-LongHorizon')
+timegpt_cv_df_x_models = timegpt_cv_df_x_long_horizon.merge(timegpt_cv_df_x)
+cutoffs = timegpt_cv_df_x_models.query('unique_id == "BE"')['cutoff'].unique()
+for cutoff in cutoffs:
+ fig = nixtla_client.plot(
+ df.query('unique_id == "BE"').tail(24 * 7),
+ timegpt_cv_df_x_models.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']),
+ models=['TimeGPT', 'TimeGPT-LongHorizon'],
+ level=[80, 90],
+ )
+ display(fig)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Using the following exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
+INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
+```
+
+
+
+
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.cross_validation(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
diff --git a/nixtla/docs/tutorials/exogenous_variables.html.mdx b/nixtla/docs/tutorials/exogenous_variables.html.mdx
new file mode 100644
index 00000000..eba25556
--- /dev/null
+++ b/nixtla/docs/tutorials/exogenous_variables.html.mdx
@@ -0,0 +1,445 @@
+---
+output-file: exogenous_variables.html
+title: Exogenous variables
+---
+
+
+Exogenous variables or external factors are crucial in time series
+forecasting as they provide additional information that might influence
+the prediction. These variables could include holiday markers, marketing
+spending, weather data, or any other external data that correlate with
+the time series data you are forecasting.
+
+For example, if you’re forecasting ice cream sales, temperature data
+could serve as a useful exogenous variable. On hotter days, ice cream
+sales may increase.
+
+To incorporate exogenous variables in TimeGPT, you’ll need to pair each
+point in your time series data with the corresponding external data.
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/01_exogenous_variables.ipynb)
+
+## 1. Import packages
+
+First, we import the required packages and initialize the Nixtla client.
+
+```python
+import pandas as pd
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, remember to set also the `base_url`
+> argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+Let’s see an example on predicting day-ahead electricity prices. The
+following dataset contains the hourly electricity price (`y` column) for
+five markets in Europe and US, identified by the `unique_id` column. The
+columns from `Exogenous1` to `day_6` are exogenous variables that
+TimeGPT will use to predict the prices.
+
+```python
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
+df.head()
+```
+
+| | unique_id | ds | y | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 |
+|----|----|----|----|----|----|----|----|----|----|----|----|----|
+| 0 | BE | 2016-10-22 00:00:00 | 70.00 | 57253.0 | 49593.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 1 | BE | 2016-10-22 01:00:00 | 37.10 | 51887.0 | 46073.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 2 | BE | 2016-10-22 02:00:00 | 37.10 | 51896.0 | 44927.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 3 | BE | 2016-10-22 03:00:00 | 44.75 | 48428.0 | 44483.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 4 | BE | 2016-10-22 04:00:00 | 37.10 | 46721.0 | 44338.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+
+## 3a. Forecasting electricity prices using future exogenous variables
+
+To produce forecasts with future exogenous variables we have to add the
+future values of the exogenous variables. Let’s read this dataset. In
+this case, we want to predict 24 steps ahead, therefore each `unique_id`
+will have 24 observations.
+
+```python
+future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
+future_ex_vars_df.head()
+```
+
+| | unique_id | ds | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 |
+|----|----|----|----|----|----|----|----|----|----|----|----|
+| 0 | BE | 2016-12-31 00:00:00 | 70318.0 | 64108.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 1 | BE | 2016-12-31 01:00:00 | 67898.0 | 62492.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 2 | BE | 2016-12-31 02:00:00 | 68379.0 | 61571.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 3 | BE | 2016-12-31 03:00:00 | 64972.0 | 60381.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 4 | BE | 2016-12-31 04:00:00 | 62900.0 | 60298.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+
+Let’s call the `forecast` method, adding this information:
+
+```python
+timegpt_fcst_ex_vars_df = nixtla_client.forecast(df=df, X_df=future_ex_vars_df, h=24, level=[80, 90])
+timegpt_fcst_ex_vars_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Using future exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| | unique_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
+|----|----|----|----|----|----|----|----|
+| 0 | BE | 2016-12-31 00:00:00 | 51.632830 | 61.598820 | 66.088295 | 41.666843 | 37.177372 |
+| 1 | BE | 2016-12-31 01:00:00 | 45.750877 | 54.611988 | 60.176445 | 36.889767 | 31.325312 |
+| 2 | BE | 2016-12-31 02:00:00 | 39.650543 | 46.256210 | 52.842808 | 33.044876 | 26.458277 |
+| 3 | BE | 2016-12-31 03:00:00 | 34.000072 | 44.015310 | 47.429000 | 23.984835 | 20.571144 |
+| 4 | BE | 2016-12-31 04:00:00 | 33.785370 | 43.140503 | 48.581240 | 24.430239 | 18.989498 |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+nixtla_client.plot(
+ df[['unique_id', 'ds', 'y']],
+ timegpt_fcst_ex_vars_df,
+ max_insample_length=365,
+ level=[80, 90],
+)
+```
+
+
+
+We can also show the importance of the features.
+
+```python
+nixtla_client.weights_x.plot.barh(x='features', y='weights')
+```
+
+
+
+This plot shows that `Exogenous1` and `Exogenous2` are the most
+important for this forecasting task, as they have the largest weight.
+
+## 3b. Forecasting electricity prices using historic exogenous variables
+
+In the example above, we just loaded the future exogenous variables.
+Often, these are not available because these variables are unknown. We
+can also make forecasts using only historic exogenous variables. This
+can be done by adding the `hist_exog_list` argument with the list of
+columns of `df` to be considered as historical. In that case, we can
+pass all extra columns available in `df` as historic exogenous variables
+using
+`hist_exog_list=['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']`.
+
+> **Important**
+>
+> If you include historic exogenous variables in your model, you are
+> *implicitly* making assumptions about the future of these exogenous
+> variables in your forecast. It is recommended to make these
+> assumptions explicit by making use of future exogenous variables.
+
+Let’s call the `forecast` method, adding `hist_exog_list`:
+
+```python
+timegpt_fcst_hist_ex_vars_df = nixtla_client.forecast(
+ df=df,
+ h=24,
+ level=[80, 90],
+ hist_exog_list=['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6'],
+)
+timegpt_fcst_hist_ex_vars_df.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Using historical exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| | unique_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
+|----|----|----|----|----|----|----|----|
+| 0 | BE | 2016-12-31 00:00:00 | 47.311330 | 57.277317 | 61.766790 | 37.345340 | 32.855870 |
+| 1 | BE | 2016-12-31 01:00:00 | 47.142740 | 56.003850 | 61.568306 | 38.281628 | 32.717170 |
+| 2 | BE | 2016-12-31 02:00:00 | 47.311474 | 53.917137 | 60.503740 | 40.705810 | 34.119210 |
+| 3 | BE | 2016-12-31 03:00:00 | 47.224514 | 57.239750 | 60.653442 | 37.209280 | 33.795586 |
+| 4 | BE | 2016-12-31 04:00:00 | 47.266945 | 56.622078 | 62.062817 | 37.911810 | 32.471073 |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+nixtla_client.plot(
+ df[['unique_id', 'ds', 'y']],
+ timegpt_fcst_hist_ex_vars_df,
+ max_insample_length=365,
+ level=[80, 90],
+)
+```
+
+
+
+## 3c. Forecasting electricity prices using future and historic exogenous variables
+
+A third option is to use both historic and future exogenous variables.
+For example, we might not have available the future information for
+`Exogenous1` and `Exogenous2`. In this example, we drop these variables
+from our future exogenous dataframe (because we assume we do not know
+the future value of these variables), and add them to `hist_exog_list`
+to be considered as historical exogenous variables.
+
+```python
+hist_cols = ["Exogenous1", "Exogenous2"]
+future_ex_vars_df_limited = future_ex_vars_df.drop(columns=hist_cols)
+timegpt_fcst_ex_vars_df_limited = nixtla_client.forecast(df=df, X_df=future_ex_vars_df_limited, h=24, level=[80, 90], hist_exog_list=hist_cols)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Using future exogenous features: ['day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
+INFO:nixtla.nixtla_client:Using historical exogenous features: ['Exogenous1', 'Exogenous2']
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+nixtla_client.plot(
+ df[['unique_id', 'ds', 'y']],
+ timegpt_fcst_ex_vars_df_limited,
+ max_insample_length=365,
+ level=[80, 90],
+)
+```
+
+
+
+Note that TimeGPT informs you which variables are used as historic
+exogenous and which are used as future exogenous.
+
+## 3d. Forecasting future exogenous variables
+
+A fourth option in case the future exogenous variables are not available
+is to forecast them. Below, we’ll show you how we can also forecast
+`Exogenous1` and `Exogenous2` separately, so that you can generate the
+future exogenous variables in case they are not available.
+
+```python
+# We read the data and create separate dataframes for the historic exogenous that we want to forecast separately.
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
+df_exog1 = df[['unique_id', 'ds', 'Exogenous1']]
+df_exog2 = df[['unique_id', 'ds', 'Exogenous2']]
+```
+
+Next, we can use TimeGPT to forecast `Exogenous1` and `Exogenous2`. In
+this case, we assume these quantities can be separately forecast.
+
+```python
+timegpt_fcst_ex1 = nixtla_client.forecast(df=df_exog1, h=24, target_col='Exogenous1')
+timegpt_fcst_ex2 = nixtla_client.forecast(df=df_exog2, h=24, target_col='Exogenous2')
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+We can now start creating `X_df`, which contains the future exogenous
+variables.
+
+```python
+timegpt_fcst_ex1 = timegpt_fcst_ex1.rename(columns={'TimeGPT':'Exogenous1'})
+timegpt_fcst_ex2 = timegpt_fcst_ex2.rename(columns={'TimeGPT':'Exogenous2'})
+```
+
+
+```python
+X_df = timegpt_fcst_ex1.merge(timegpt_fcst_ex2)
+```
+
+Next, we also need to add the `day_0` to `day_6` future exogenous
+variables. These are easy: this is just the weekday, which we can
+extract from the `ds` column.
+
+```python
+# We have 7 days, for each day a separate column denoting 1/0
+for i in range(7):
+ X_df[f'day_{i}'] = 1 * (pd.to_datetime(X_df['ds']).dt.weekday == i)
+```
+
+We have now created `X_df`, let’s investigate it:
+
+```python
+X_df.head(10)
+```
+
+| | unique_id | ds | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 |
+|----|----|----|----|----|----|----|----|----|----|----|----|
+| 0 | BE | 2016-12-31 00:00:00 | 70861.410 | 66282.560 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 1 | BE | 2016-12-31 01:00:00 | 67851.830 | 64465.370 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 2 | BE | 2016-12-31 02:00:00 | 67246.660 | 63257.117 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 3 | BE | 2016-12-31 03:00:00 | 64027.203 | 62059.316 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 4 | BE | 2016-12-31 04:00:00 | 61524.086 | 61247.062 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 5 | BE | 2016-12-31 05:00:00 | 63054.086 | 62052.312 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 6 | BE | 2016-12-31 06:00:00 | 65199.473 | 63457.720 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 7 | BE | 2016-12-31 07:00:00 | 68285.770 | 65388.656 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 8 | BE | 2016-12-31 08:00:00 | 72038.484 | 67406.836 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| 9 | BE | 2016-12-31 09:00:00 | 72821.190 | 68057.240 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+
+Let’s compare it to our pre-loaded version:
+
+```python
+future_ex_vars_df.head(10)
+```
+
+| | unique_id | ds | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 |
+|----|----|----|----|----|----|----|----|----|----|----|----|
+| 0 | BE | 2016-12-31 00:00:00 | 70318.0 | 64108.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 1 | BE | 2016-12-31 01:00:00 | 67898.0 | 62492.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 2 | BE | 2016-12-31 02:00:00 | 68379.0 | 61571.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 3 | BE | 2016-12-31 03:00:00 | 64972.0 | 60381.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 4 | BE | 2016-12-31 04:00:00 | 62900.0 | 60298.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 5 | BE | 2016-12-31 05:00:00 | 62364.0 | 60339.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 6 | BE | 2016-12-31 06:00:00 | 64242.0 | 62576.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 7 | BE | 2016-12-31 07:00:00 | 65884.0 | 63732.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 8 | BE | 2016-12-31 08:00:00 | 68217.0 | 66235.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+| 9 | BE | 2016-12-31 09:00:00 | 69921.0 | 66801.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
+
+As you can see, the values for `Exogenous1` and `Exogenous2` are
+slightly different, which makes sense because we’ve made a forecast of
+these values with TimeGPT.
+
+Let’s create a new forecast of our electricity prices with TimeGPT using
+our new `X_df`:
+
+```python
+timegpt_fcst_ex_vars_df_new = nixtla_client.forecast(df=df, X_df=X_df, h=24, level=[80, 90])
+timegpt_fcst_ex_vars_df_new.head()
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Using future exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+| | unique_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
+|----|----|----|----|----|----|----|----|
+| 0 | BE | 2016-12-31 00:00:00 | 46.987225 | 56.953213 | 61.442684 | 37.021236 | 32.531765 |
+| 1 | BE | 2016-12-31 01:00:00 | 25.719133 | 34.580242 | 40.144700 | 16.858023 | 11.293568 |
+| 2 | BE | 2016-12-31 02:00:00 | 38.553528 | 45.159195 | 51.745792 | 31.947860 | 25.361261 |
+| 3 | BE | 2016-12-31 03:00:00 | 35.771927 | 45.787163 | 49.200855 | 25.756690 | 22.342999 |
+| 4 | BE | 2016-12-31 04:00:00 | 34.555115 | 43.910248 | 49.350986 | 25.199984 | 19.759243 |
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+Let’s create a combined dataframe with the two forecasts and plot the
+values to compare the forecasts.
+
+```python
+timegpt_fcst_ex_vars_df = timegpt_fcst_ex_vars_df.rename(columns={'TimeGPT':'TimeGPT-provided_exogenous'})
+timegpt_fcst_ex_vars_df_new = timegpt_fcst_ex_vars_df_new.rename(columns={'TimeGPT':'TimeGPT-forecasted_exogenous'})
+
+forecasts = timegpt_fcst_ex_vars_df[['unique_id', 'ds', 'TimeGPT-provided_exogenous']].merge(timegpt_fcst_ex_vars_df_new[['unique_id', 'ds', 'TimeGPT-forecasted_exogenous']])
+```
+
+
+```python
+nixtla_client.plot(
+ df[['unique_id', 'ds', 'y']],
+ forecasts,
+ max_insample_length=365,
+)
+```
+
+
+
+As you can see, we obtain a slightly different forecast if we use our
+forecasted exogenous variables.
+
diff --git a/nixtla/docs/tutorials/finetune_depth_finetuning.html.mdx b/nixtla/docs/tutorials/finetune_depth_finetuning.html.mdx
new file mode 100644
index 00000000..40bcde13
--- /dev/null
+++ b/nixtla/docs/tutorials/finetune_depth_finetuning.html.mdx
@@ -0,0 +1,165 @@
+---
+output-file: finetune_depth_finetuning.html
+title: Controlling the level of fine-tuning
+---
+
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/23_finetune_depth_finetuning.ipynb)
+
+## 1. Import packages
+
+First, we import the required packages and initialize the Nixtla client
+
+```python
+import pandas as pd
+from nixtla import NixtlaClient
+from utilsforecast.losses import mae, mse
+from utilsforecast.evaluation import evaluate
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, remember to set also the `base_url`
+> argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+```python
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
+df.head()
+```
+
+| | timestamp | value |
+|-----|------------|-------|
+| 0 | 1949-01-01 | 112 |
+| 1 | 1949-02-01 | 118 |
+| 2 | 1949-03-01 | 132 |
+| 3 | 1949-04-01 | 129 |
+| 4 | 1949-05-01 | 121 |
+
+Now, we split the data into a training and test set so that we can
+measure the performance of the model as we vary `finetune_depth`.
+
+```python
+train = df[:-24]
+test = df[-24:]
+```
+
+Next, we fine-tune TimeGPT and vary `finetune_depth` to measure the
+impact on performance.
+
+## 3. Fine-tuning with `finetune_depth`
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+As mentioned above, `finetune_depth` controls how many parameters from
+TimeGPT are fine-tuned on your particular dataset. If the value is set
+to 1, only a few parameters are fine-tuned. Setting it to 5 means that
+all parameters of the model will be fine-tuned.
+
+Using a large value for `finetune_depth` can lead to better performances
+for large datasets with complex patterns. However, it can also lead to
+overfitting, in which case the accuracy of the forecasts may degrade, as
+we will see from the small experiment below.
+
+```python
+depths = [1, 2, 3, 4, 5]
+
+test = test.copy()
+
+for depth in depths:
+ preds_df = nixtla_client.forecast(
+ df=train,
+ h=24,
+ finetune_steps=5,
+ finetune_depth=depth,
+ time_col='timestamp',
+ target_col='value')
+
+ preds = preds_df['TimeGPT'].values
+
+ test.loc[:,f'TimeGPT_depth{depth}'] = preds
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: MS
+INFO:nixtla.nixtla_client:Querying model metadata...
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: MS
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: MS
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: MS
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: MS
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+```python
+test['unique_id'] = 0
+
+evaluation = evaluate(test, metrics=[mae, mse], time_col="timestamp", target_col="value")
+evaluation
+```
+
+| | unique_id | metric | TimeGPT_depth1 | TimeGPT_depth2 | TimeGPT_depth3 | TimeGPT_depth4 | TimeGPT_depth5 |
+|----|----|----|----|----|----|----|----|
+| 0 | 0 | mae | 22.675540 | 17.908963 | 21.318518 | 24.745096 | 28.734302 |
+| 1 | 0 | mse | 677.254283 | 461.320852 | 676.202126 | 991.835359 | 1119.722602 |
+
+From the result above, we can see that a `finetune_depth` of 2 achieves
+the best results since it has the lowest MAE and MSE.
+
+Also notice that with a `finetune_depth` of 4 and 5, the performance
+degrades, which is a clear sign of overfitting.
+
+Thus, keep in mind that fine-tuning can be a bit of trial and error. You
+might need to adjust the number of `finetune_steps` and the level of
+`finetune_depth` based on your specific needs and the complexity of your
+data. Usually, a higher `finetune_depth` works better for large
+datasets. In this specific tutorial, since we were forecasting a single
+series with a very short dataset, increasing the depth led to
+overfitting.
+
+It’s recommended to monitor the model’s performance during fine-tuning
+and adjust as needed. Be aware that more `finetune_steps` and a larger
+value of `finetune_depth` may lead to longer training times and could
+potentially lead to overfitting if not managed properly.
+
diff --git a/nixtla/docs/tutorials/finetuning.html.mdx b/nixtla/docs/tutorials/finetuning.html.mdx
new file mode 100644
index 00000000..e4e8cfa0
--- /dev/null
+++ b/nixtla/docs/tutorials/finetuning.html.mdx
@@ -0,0 +1,130 @@
+---
+output-file: finetuning.html
+title: Fine-tuning
+---
+
+
+Fine-tuning is a powerful process for utilizing TimeGPT more
+effectively. Foundation models such as TimeGPT are pre-trained on vast
+amounts of data, capturing wide-ranging features and patterns. These
+models can then be specialized for specific contexts or domains. With
+fine-tuning, the model’s parameters are refined to forecast a new task,
+allowing it to tailor its vast pre-existing knowledge towards the
+requirements of the new data. Fine-tuning thus serves as a crucial
+bridge, linking TimeGPT’s broad capabilities to your tasks
+specificities.
+
+Concretely, the process of fine-tuning consists of performing a certain
+number of training iterations on your input data minimizing the
+forecasting error. The forecasts will then be produced with the updated
+model. To control the number of iterations, use the `finetune_steps`
+argument of the `forecast` method.
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/06_finetuning.ipynb)
+
+## 1. Import packages
+
+First, we import the required packages and initialize the Nixtla client
+
+```python
+import pandas as pd
+from nixtla import NixtlaClient
+from utilsforecast.losses import mae, mse
+from utilsforecast.evaluation import evaluate
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, remember to set also the `base_url`
+> argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+```python
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
+df.head()
+```
+
+| | timestamp | value |
+|-----|------------|-------|
+| 0 | 1949-01-01 | 112 |
+| 1 | 1949-02-01 | 118 |
+| 2 | 1949-03-01 | 132 |
+| 3 | 1949-04-01 | 129 |
+| 4 | 1949-05-01 | 121 |
+
+## 3. Fine-tuning
+
+Here, `finetune_steps=10` means the model will go through 10 iterations
+of training on your time series data.
+
+```python
+timegpt_fcst_finetune_df = nixtla_client.forecast(
+ df=df, h=12, finetune_steps=10,
+ time_col='timestamp', target_col='value',
+)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: MS
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+nixtla_client.plot(
+ df, timegpt_fcst_finetune_df,
+ time_col='timestamp', target_col='value',
+)
+```
+
+
+
+Keep in mind that fine-tuning can be a bit of trial and error. You might
+need to adjust the number of `finetune_steps` based on your specific
+needs and the complexity of your data. Usually, a larger value of
+`finetune_steps` works better for large datasets.
+
+It’s recommended to monitor the model’s performance during fine-tuning
+and adjust as needed. Be aware that more `finetune_steps` may lead to
+longer training times and could potentially lead to overfitting if not
+managed properly.
+
+Remember, fine-tuning is a powerful feature, but it should be used
+thoughtfully and carefully.
+
+For a detailed guide on using a specific loss function for fine-tuning,
+check out the [Fine-tuning with a specific loss
+function](https://docs.nixtla.io/docs/tutorials-fine_tuning_with_a_specific_loss_function)
+tutorial.
+
+Read also our detailed tutorial on [controlling the level of
+fine-tuning](https://docs.nixtla.io/docs/tutorials-finetune_depth_finetuning)
+using `finetune_depth`.
+
diff --git a/nixtla/docs/tutorials/hierarchical_forecasting.html.mdx b/nixtla/docs/tutorials/hierarchical_forecasting.html.mdx
new file mode 100644
index 00000000..f60560c9
--- /dev/null
+++ b/nixtla/docs/tutorials/hierarchical_forecasting.html.mdx
@@ -0,0 +1,314 @@
+---
+output-file: hierarchical_forecasting.html
+title: Hierarchical forecasting
+---
+
+
+In forecasting, we often find ourselves in need of forecasts for both
+lower- and higher (temporal) granularities, such as product demand
+forecasts but also product category or product department forecasts.
+These granularities can be formalized through the use of a hierarchy. In
+hierarchical forecasting, we create forecasts that are coherent with
+respect to a pre-specified hierarchy of the underlying time series.
+
+With TimeGPT, we can create forecasts for multiple time series. We can
+subsequently post-process these forecasts using hierarchical forecasting
+techniques of
+[HierarchicalForecast](https://nixtlaverse.nixtla.io/hierarchicalforecast/index.html).
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/14_hierarchical_forecasting.ipynb)
+
+## 1. Import packages
+
+First, we import the required packages and initialize the Nixtla client.
+
+```python
+import pandas as pd
+import numpy as np
+
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, set the `base_url` argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+We use the Australian Tourism dataset, from [Forecasting, Principles and
+Practices](https://otexts.com/fpp3/) which contains data on Australian
+Tourism. We are interested in forecasts for Australia’s 7 States, 27
+Zones and 76 Regions. This constitutes a hierarchy, where forecasts for
+the lower levels (e.g. the regions Sidney, Blue Mountains and Hunter)
+should be coherent with the forecasts of the higher levels (e.g. New
+South Wales).
+
+
+
+
+The dataset only contains the time series at the lowest level, so we
+need to create the time series for all hierarchies.
+
+```python
+Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
+Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
+Y_df.insert(0, 'Country', 'Australia')
+Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
+Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
+Y_df['ds'] = pd.to_datetime(Y_df['ds'])
+
+Y_df.head(10)
+```
+
+``` text
+C:\Users\ospra\AppData\Local\Temp\ipykernel_16668\3753786659.py:6: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
+ Y_df['ds'] = pd.to_datetime(Y_df['ds'])
+```
+
+| | Country | Region | State | Purpose | ds | y |
+|-----|-----------|----------|-----------------|----------|------------|------------|
+| 0 | Australia | Adelaide | South Australia | Business | 1998-01-01 | 135.077690 |
+| 1 | Australia | Adelaide | South Australia | Business | 1998-04-01 | 109.987316 |
+| 2 | Australia | Adelaide | South Australia | Business | 1998-07-01 | 166.034687 |
+| 3 | Australia | Adelaide | South Australia | Business | 1998-10-01 | 127.160464 |
+| 4 | Australia | Adelaide | South Australia | Business | 1999-01-01 | 137.448533 |
+| 5 | Australia | Adelaide | South Australia | Business | 1999-04-01 | 199.912586 |
+| 6 | Australia | Adelaide | South Australia | Business | 1999-07-01 | 169.355090 |
+| 7 | Australia | Adelaide | South Australia | Business | 1999-10-01 | 134.357937 |
+| 8 | Australia | Adelaide | South Australia | Business | 2000-01-01 | 154.034398 |
+| 9 | Australia | Adelaide | South Australia | Business | 2000-04-01 | 168.776364 |
+
+The dataset can be grouped in the following hierarchical structure.
+
+```python
+spec = [
+ ['Country'],
+ ['Country', 'State'],
+ ['Country', 'Purpose'],
+ ['Country', 'State', 'Region'],
+ ['Country', 'State', 'Purpose'],
+ ['Country', 'State', 'Region', 'Purpose']
+]
+```
+
+Using the `aggregate` function from `HierarchicalForecast` we can get
+the full set of time series.
+
+> **Note**
+>
+> You can install `hierarchicalforecast` with `pip`:
+>
+> ```shell
+> pip install hierarchicalforecast
+> ```
+
+```python
+from hierarchicalforecast.utils import aggregate
+```
+
+
+```python
+Y_df, S_df, tags = aggregate(Y_df, spec)
+
+Y_df.head(10)
+```
+
+| | unique_id | ds | y |
+|-----|-----------|------------|--------------|
+| 0 | Australia | 1998-01-01 | 23182.197269 |
+| 1 | Australia | 1998-04-01 | 20323.380067 |
+| 2 | Australia | 1998-07-01 | 19826.640511 |
+| 3 | Australia | 1998-10-01 | 20830.129891 |
+| 4 | Australia | 1999-01-01 | 22087.353380 |
+| 5 | Australia | 1999-04-01 | 21458.373285 |
+| 6 | Australia | 1999-07-01 | 19914.192508 |
+| 7 | Australia | 1999-10-01 | 20027.925640 |
+| 8 | Australia | 2000-01-01 | 22339.294779 |
+| 9 | Australia | 2000-04-01 | 19941.063482 |
+
+We use the final two years (8 quarters) as test set.
+
+```python
+Y_test_df = Y_df.groupby('unique_id').tail(8)
+Y_train_df = Y_df.drop(Y_test_df.index)
+```
+
+## 3. Hierarchical forecasting with TimeGPT
+
+First, we create base forecasts for all the time series with TimeGPT.
+Note that we set `add_history=True`, as we will need the in-sample
+fitted values of TimeGPT.
+
+We will predict 2 years (8 quarters), starting from 01-01-2016.
+
+```python
+timegpt_fcst = nixtla_client.forecast(df=Y_train_df, h=8, freq='QS', add_history=True)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+INFO:nixtla.nixtla_client:Calling Historical Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+timegpt_fcst_insample = timegpt_fcst.query("ds < '2016-01-01'")
+timegpt_fcst_outsample = timegpt_fcst.query("ds >= '2016-01-01'")
+```
+
+Let’s plot some of the forecasts, starting from the highest aggregation
+level (`Australia`), to the lowest level
+(`Australia/Queensland/Brisbane/Holiday`). We can see that there is room
+for improvement in the forecasts.
+
+```python
+nixtla_client.plot(
+ Y_df,
+ timegpt_fcst_outsample,
+ max_insample_length=4 * 12,
+ unique_ids=['Australia', 'Australia/Queensland','Australia/Queensland/Brisbane', 'Australia/Queensland/Brisbane/Holiday']
+)
+```
+
+
+
+We can make these forecasts coherent to the specified hierarchy by using
+a `HierarchicalReconciliation` method from `NeuralForecast`. We will be
+using the
+[MinTrace](https://nixtlaverse.nixtla.io/hierarchicalforecast/methods.html)
+method.
+
+```python
+from hierarchicalforecast.methods import MinTrace
+from hierarchicalforecast.core import HierarchicalReconciliation
+```
+
+
+```python
+reconcilers = [
+ MinTrace(method='ols'),
+ MinTrace(method='mint_shrink'),
+]
+hrec = HierarchicalReconciliation(reconcilers=reconcilers)
+
+Y_df_with_insample_fcsts = Y_df.copy()
+Y_df_with_insample_fcsts = timegpt_fcst_insample.merge(Y_df_with_insample_fcsts)
+
+Y_rec_df = hrec.reconcile(Y_hat_df=timegpt_fcst_outsample, Y_df=Y_df_with_insample_fcsts, S=S_df, tags=tags)
+```
+
+
+```python
+Y_rec_df
+```
+
+| | unique_id | ds | TimeGPT | TimeGPT/MinTrace_method-ols | TimeGPT/MinTrace_method-mint_shrink |
+|----|----|----|----|----|----|
+| 0 | Australia | 2016-01-01 | 24967.19100 | 25044.408634 | 25394.406211 |
+| 1 | Australia | 2016-04-01 | 24528.88300 | 24503.089810 | 24327.212355 |
+| 2 | Australia | 2016-07-01 | 24221.77500 | 24083.107812 | 23813.826553 |
+| 3 | Australia | 2016-10-01 | 24559.44000 | 24548.038797 | 24174.894203 |
+| 4 | Australia | 2017-01-01 | 25570.33800 | 25669.248281 | 25560.277473 |
+| ... | ... | ... | ... | ... | ... |
+| 3395 | Australia/Western Australia/Experience Perth/V... | 2016-10-01 | 427.81146 | 435.423617 | 434.047102 |
+| 3396 | Australia/Western Australia/Experience Perth/V... | 2017-01-01 | 450.71786 | 453.434056 | 459.954598 |
+| 3397 | Australia/Western Australia/Experience Perth/V... | 2017-04-01 | 452.17923 | 460.197847 | 470.009789 |
+| 3398 | Australia/Western Australia/Experience Perth/V... | 2017-07-01 | 450.68683 | 463.034888 | 482.645932 |
+| 3399 | Australia/Western Australia/Experience Perth/V... | 2017-10-01 | 443.31050 | 451.754435 | 474.403379 |
+
+Again, we plot some of the forecasts. We can see a few, mostly minor
+differences in the forecasts.
+
+```python
+nixtla_client.plot(
+ Y_df,
+ Y_rec_df,
+ max_insample_length=4 * 12,
+ unique_ids=['Australia', 'Australia/Queensland','Australia/Queensland/Brisbane', 'Australia/Queensland/Brisbane/Holiday']
+)
+```
+
+
+
+Let’s numerically verify the forecasts to the situation where we don’t
+apply a post-processing step. We can use `HierarchicalEvaluation` for
+this.
+
+```python
+from hierarchicalforecast.evaluation import evaluate
+from utilsforecast.losses import rmse
+```
+
+
+```python
+eval_tags = {}
+eval_tags['Total'] = tags['Country']
+eval_tags['Purpose'] = tags['Country/Purpose']
+eval_tags['State'] = tags['Country/State']
+eval_tags['Regions'] = tags['Country/State/Region']
+eval_tags['Bottom'] = tags['Country/State/Region/Purpose']
+
+evaluation = evaluate(
+ df=Y_rec_df.merge(Y_test_df, on=['unique_id', 'ds']),
+ tags=eval_tags,
+ train_df=Y_train_df,
+ metrics=[rmse],
+)
+numeric_cols = evaluation.select_dtypes(np.number).columns
+evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.2f}'.format)
+```
+
+
+```python
+evaluation
+```
+
+| | level | metric | TimeGPT | TimeGPT/MinTrace_method-ols | TimeGPT/MinTrace_method-mint_shrink |
+|----|----|----|----|----|----|
+| 0 | Total | rmse | 1433.07 | 1436.07 | 1627.43 |
+| 1 | Purpose | rmse | 482.09 | 475.64 | 507.50 |
+| 2 | State | rmse | 275.85 | 278.39 | 294.28 |
+| 3 | Regions | rmse | 49.40 | 47.91 | 47.99 |
+| 4 | Bottom | rmse | 19.32 | 19.11 | 18.86 |
+| 5 | Overall | rmse | 38.66 | 38.21 | 39.16 |
+
+We made a small improvement in overall RMSE by reconciling the forecasts
+with `MinTrace(ols)`, and made them slightly worse using
+`MinTrace(mint_shrink)`, indicating that the base forecasts were
+relatively strong already.
+
+However, we now have coherent forecasts too - so not only did we make a
+(small) accuracy improvement, we also got coherency to the hierarchy as
+a result of our reconciliation step.
+
+**References**
+
+- [Hyndman, Rob J., and George Athanasopoulos (2021). “Forecasting:
+ Principles and Practice (3rd Ed)”](https://otexts.com/fpp3/)
+
diff --git a/nixtla/docs/tutorials/historical_forecast.html.mdx b/nixtla/docs/tutorials/historical_forecast.html.mdx
new file mode 100644
index 00000000..c753d560
--- /dev/null
+++ b/nixtla/docs/tutorials/historical_forecast.html.mdx
@@ -0,0 +1,129 @@
+---
+output-file: historical_forecast.html
+title: Historical forecast
+---
+
+
+Our time series model offers a powerful feature that allows users to
+retrieve historical forecasts alongside the prospective predictions.
+This functionality is accessible through the forecast method by setting
+the `add_history=True` argument.
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/09_historical_forecast.ipynb)
+
+## 1. Import packages
+
+First, we install and import the required packages and initialize the
+Nixtla client.
+
+```python
+import pandas as pd
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, set the `base_url` argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+Now you can start to make forecasts! Let’s import an example:
+
+```python
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
+df.head()
+```
+
+| | timestamp | value |
+|-----|------------|-------|
+| 0 | 1949-01-01 | 112 |
+| 1 | 1949-02-01 | 118 |
+| 2 | 1949-03-01 | 132 |
+| 3 | 1949-04-01 | 129 |
+| 4 | 1949-05-01 | 121 |
+
+```python
+nixtla_client.plot(df, time_col='timestamp', target_col='value')
+```
+
+
+
+## 3. Historical forecast
+
+Let’s add fitted values. When `add_history` is set to True, the output
+DataFrame will include not only the future forecasts determined by the h
+argument, but also the historical predictions. Currently, the historical
+forecasts are not affected by `h`, and have a fix horizon depending on
+the frequency of the data. The historical forecasts are produced in a
+rolling window fashion, and concatenated. This means that the model is
+applied sequentially at each time step using only the most recent
+information available up to that point.
+
+```python
+timegpt_fcst_with_history_df = nixtla_client.forecast(
+ df=df, h=12, time_col='timestamp', target_col='value',
+ add_history=True,
+)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Inferred freq: MS
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+INFO:nixtla.nixtla_client:Calling Historical Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+timegpt_fcst_with_history_df.head()
+```
+
+| | timestamp | TimeGPT |
+|-----|------------|------------|
+| 0 | 1951-01-01 | 135.483673 |
+| 1 | 1951-02-01 | 144.442398 |
+| 2 | 1951-03-01 | 157.191910 |
+| 3 | 1951-04-01 | 148.769363 |
+| 4 | 1951-05-01 | 140.472946 |
+
+Let’s plot the results. This consolidated view of past and future
+predictions can be invaluable for understanding the model’s behavior and
+for evaluating its performance over time.
+
+```python
+nixtla_client.plot(df, timegpt_fcst_with_history_df, time_col='timestamp', target_col='value')
+```
+
+
+
+Please note, however, that the initial values of the series are not
+included in these historical forecasts. This is because `TimeGPT`
+requires a certain number of initial observations to generate reliable
+forecasts. Therefore, while interpreting the output, it’s important to
+be aware that the first few observations serve as the basis for the
+model’s predictions and are not themselves predicted values.
+
diff --git a/nixtla/docs/tutorials/holidays.html.mdx b/nixtla/docs/tutorials/holidays.html.mdx
new file mode 100644
index 00000000..9a134a65
--- /dev/null
+++ b/nixtla/docs/tutorials/holidays.html.mdx
@@ -0,0 +1,220 @@
+---
+output-file: holidays.html
+title: Holidays and special dates
+---
+
+
+Calendar variables and special dates are one of the most common types of
+additional variables used in forecasting applications. They provide
+additional context on the current state of the time series, especially
+for window-based models such as TimeGPT-1. These variables often include
+adding information on each observation’s month, week, day, or hour. For
+example, in high-frequency hourly data, providing the current month of
+the year provides more context than the limited history available in the
+input window to improve the forecasts.
+
+In this tutorial we will show how to add calendar variables
+automatically to a dataset using the `date_features` function.
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/02_holidays.ipynb)
+
+## 1. Import packages
+
+First, we import the required packages and initialize the Nixtla client.
+
+```python
+import pandas as pd
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, remember to set also the `base_url`
+> argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+We will use a Google trends dataset on chocolate, with monthly data.
+
+```python
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/google_trend_chocolate.csv')
+df['month'] = pd.to_datetime(df['month']).dt.to_period('M').dt.to_timestamp('M')
+```
+
+
+```python
+df.head()
+```
+
+| | month | chocolate |
+|-----|------------|-----------|
+| 0 | 2004-01-31 | 35 |
+| 1 | 2004-02-29 | 45 |
+| 2 | 2004-03-31 | 28 |
+| 3 | 2004-04-30 | 30 |
+| 4 | 2004-05-31 | 29 |
+
+## 3. Forecasting with holidays and special dates
+
+Given the predominance usage of calendar variables, we included an
+automatic creation of common calendar variables to the forecast method
+as a pre-processing step. Let’s create a future dataframe that contains
+the upcoming holidays in the United States.
+
+```python
+# Create future dataframe with exogenous features
+
+start_date = '2024-05'
+dates = pd.date_range(start=start_date, periods=14, freq='M')
+
+dates = dates.to_period('M').to_timestamp('M')
+
+future_df = pd.DataFrame(dates, columns=['month'])
+```
+
+
+```python
+from nixtla.date_features import CountryHolidays
+
+us_holidays = CountryHolidays(countries=['US'])
+dates = pd.date_range(start=future_df.iloc[0]['month'], end=future_df.iloc[-1]['month'], freq='D')
+holidays_df = us_holidays(dates)
+monthly_holidays = holidays_df.resample('M').max()
+
+monthly_holidays = monthly_holidays.reset_index(names='month')
+
+future_df = future_df.merge(monthly_holidays)
+
+future_df.head()
+```
+
+| | month | US_New Year's Day | US_Memorial Day | US_Juneteenth National Independence Day | US_Independence Day | US_Labor Day | US_Veterans Day | US_Thanksgiving | US_Christmas Day | US_Martin Luther King Jr. Day | US_Washington's Birthday | US_Columbus Day |
+|----|----|----|----|----|----|----|----|----|----|----|----|----|
+| 0 | 2024-05-31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 1 | 2024-06-30 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2 | 2024-07-31 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 3 | 2024-08-31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 4 | 2024-09-30 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
+
+We perform the same steps for the input dataframe.
+
+```python
+# Add exogenous features to input dataframe
+
+dates = pd.date_range(start=df.iloc[0]['month'], end=df.iloc[-1]['month'], freq='D')
+holidays_df = us_holidays(dates)
+monthly_holidays = holidays_df.resample('M').max()
+
+monthly_holidays = monthly_holidays.reset_index(names='month')
+
+df = df.merge(monthly_holidays)
+
+df.tail()
+```
+
+| | month | chocolate | US_New Year's Day | US_New Year's Day (observed) | US_Memorial Day | US_Independence Day | US_Independence Day (observed) | US_Labor Day | US_Veterans Day | US_Thanksgiving | US_Christmas Day | US_Christmas Day (observed) | US_Martin Luther King Jr. Day | US_Washington's Birthday | US_Columbus Day | US_Veterans Day (observed) | US_Juneteenth National Independence Day | US_Juneteenth National Independence Day (observed) |
+|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
+| 239 | 2023-12-31 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 240 | 2024-01-31 | 64 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
+| 241 | 2024-02-29 | 66 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
+| 242 | 2024-03-31 | 59 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 243 | 2024-04-30 | 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+
+Great! Now, TimeGPT will consider the holidays as exogenous variables
+and the upcoming holidays will help it make predictions.
+
+```python
+fcst_df = nixtla_client.forecast(
+ df=df,
+ h=14,
+ freq='M',
+ time_col='month',
+ target_col='chocolate',
+ X_df=future_df
+)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Inferred freq: M
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Using the following exogenous variables: US_New Year's Day, US_Memorial Day, US_Juneteenth National Independence Day, US_Independence Day, US_Labor Day, US_Veterans Day, US_Thanksgiving, US_Christmas Day, US_Martin Luther King Jr. Day, US_Washington's Birthday, US_Columbus Day
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+nixtla_client.plot(
+ df,
+ fcst_df,
+ time_col='month',
+ target_col='chocolate',
+)
+```
+
+
+
+We can then plot the weights of each holiday to see which are more
+important in forecasing the interest in chocolate.
+
+```python
+nixtla_client.weights_x.plot.barh(x='features', y='weights', figsize=(10, 10))
+```
+
+
+
+Here’s a breakdown of how the `date_features` parameter works:
+
+- **`date_features` (bool or list of str or callable)**: This
+ parameter specifies which date attributes to consider.
+ - If set to `True`, the model will automatically add the most
+ common date features related to the frequency of the given
+ dataframe (`df`). For a daily frequency, this could include
+ features like day of the week, month, and year.
+ - If provided a list of strings, it will consider those specific
+ date attributes. For example,
+ `date_features=['weekday', 'month']` will only add the day of
+ the week and month as features.
+ - If provided a callable, it should be a function that takes dates
+ as input and returns the desired feature. This gives flexibility
+ in computing custom date features.
+- **`date_features_to_one_hot` (bool or list of str)**: After
+ determining the date features, one might want to one-hot encode
+ them, especially if they are categorical in nature (like weekdays).
+ One-hot encoding transforms these categorical features into a binary
+ matrix, making them more suitable for many machine learning
+ algorithms.
+ - If `date_features=True`, then by default, all computed date
+ features will be one-hot encoded.
+ - If provided a list of strings, only those specific date features
+ will be one-hot encoded.
+
+By leveraging the `date_features` and `date_features_to_one_hot`
+parameters, one can efficiently incorporate the temporal effects of date
+attributes into their forecasting model, potentially enhancing its
+accuracy and interpretability.
+
diff --git a/nixtla/docs/tutorials/how_to_improve_forecast_accuracy.html.mdx b/nixtla/docs/tutorials/how_to_improve_forecast_accuracy.html.mdx
new file mode 100644
index 00000000..b524b4e6
--- /dev/null
+++ b/nixtla/docs/tutorials/how_to_improve_forecast_accuracy.html.mdx
@@ -0,0 +1,421 @@
+---
+output-file: how_to_improve_forecast_accuracy.html
+title: Improve Forecast Accuracy with TimeGPT
+---
+
+
+In this notebook, we demonstrate how to use TimeGPT for forecasting and
+explore three common strategies to enhance forecast accuracy. We use the
+hourly electricity price data from Germany as our example dataset.
+Before running the notebook, please initiate a NixtlaClient object with
+your api_key in the code snippet below.
+
+### Result Summary
+
+| Steps | Description | MAE | MAE Improvement (%) | RMSE | RMSE Improvement (%) |
+|------|----------------------|------|----------------|------|-----------------|
+| 0 | Zero-Shot TimeGPT | 18.5 | N/A | 20.0 | N/A |
+| 1 | Add Fine-Tuning Steps | 11.5 | 38% | 12.6 | 37% |
+| 2 | Adjust Fine-Tuning Loss | 9.6 | 48% | 11.0 | 45% |
+| 3 | Fine-tune more parameters | 9.0 | 51% | 11.3 | 44% |
+| 4 | Add Exogenous Variables | 4.6 | 75% | 6.4 | 68% |
+| 5 | Switch to Long-Horizon Model | 6.4 | 65% | 7.7 | 62% |
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/22_how_to_improve_forecast_accuracy.ipynb)
+
+First, we install and import the required packages, initialize the
+Nixtla client and create a function for calculating evaluation metrics.
+
+```python
+import numpy as np
+import pandas as pd
+
+from utilsforecast.evaluation import evaluate
+from utilsforecast.plotting import plot_series
+from utilsforecast.losses import mae, rmse
+from nixtla import NixtlaClient
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+## 1. load in dataset
+
+In this notebook, we use hourly electricity prices as our example
+dataset, which consists of 5 time series, each with approximately 1700
+data points. For demonstration purposes, we focus on the German
+electricity price series. The time series is split, with the last 48
+steps (2 days) set aside as the test set.
+
+```python
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
+df['ds'] = pd.to_datetime(df['ds'])
+df_sub = df.query('unique_id == "DE"')
+```
+
+
+```python
+df_train = df_sub.query('ds < "2017-12-29"')
+df_test = df_sub.query('ds >= "2017-12-29"')
+df_train.shape, df_test.shape
+```
+
+``` text
+((1632, 12), (48, 12))
+```
+
+```python
+plot_series(df_train[['unique_id','ds','y']][-200:], forecasts_df= df_test[['unique_id','ds','y']].rename(columns={'y': 'test'}))
+```
+
+
+
+## 2. Benchmark Forecasting using TimeGPT
+
+We used TimeGPT to generate a zero-shot forecast for the time series. As
+illustrated in the plot, TimeGPT captures the overall trend reasonably
+well, but it falls short in modeling the short-term fluctuations and
+cyclical patterns present in the actual data. During the test period,
+the model achieved a Mean Absolute Error (MAE) of 18.5 and a Root Mean
+Square Error (RMSE) of 20. This forecast serves as a baseline for
+further comparison and optimization.
+
+```python
+fcst_timegpt = nixtla_client.forecast(df = df_train[['unique_id','ds','y']],
+ h=2*24,
+ target_col = 'y',
+ level = [90, 95])
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Querying model metadata...
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon, this may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+```python
+metrics = [mae, rmse]
+```
+
+
+```python
+evaluation = evaluate(
+ fcst_timegpt.merge(df_test, on=['unique_id', 'ds']),
+ metrics=metrics,
+ models=['TimeGPT']
+)
+evaluation
+```
+
+| | unique_id | metric | TimeGPT |
+|-----|-----------|--------|-----------|
+| 0 | DE | mae | 18.519004 |
+| 1 | DE | rmse | 20.037751 |
+
+```python
+plot_series(df_sub.iloc[-150:], forecasts_df= fcst_timegpt, level = [90])
+```
+
+
+
+## 3. Methods to Improve Forecast Accuracy
+
+### 3a. Add Finetune Steps
+
+The first approach to enhance forecast accuracy is to increase the
+number of fine-tuning steps. The fine-tuning process adjusts the weights
+within the TimeGPT model, allowing it to better fit your customized
+data. This adjustment enables TimeGPT to learn the nuances of your time
+series more effectively, leading to more accurate forecasts. With 30
+fine-tuning steps, we observe that the MAE decreases to 11.5 and the
+RMSE drops to 12.6.
+
+```python
+fcst_finetune_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']],
+ h=24*2,
+ finetune_steps = 30,
+ level=[90, 95])
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon, this may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+```python
+evaluation = evaluate(
+ fcst_finetune_df.merge(df_test, on=['unique_id', 'ds']),
+ metrics=metrics,
+ models=['TimeGPT']
+)
+evaluation
+```
+
+| | unique_id | metric | TimeGPT |
+|-----|-----------|--------|-----------|
+| 0 | DE | mae | 11.458185 |
+| 1 | DE | rmse | 12.642999 |
+
+```python
+plot_series(df_sub[-200:], forecasts_df= fcst_finetune_df, level = [90])
+```
+
+
+
+### 3b. Finetune with Different Loss Function
+
+The second way to further reduce forecast error is to adjust the loss
+function used during fine-tuning. You can specify your customized loss
+function using the `finetune_loss` parameter. By modifying the loss
+function, we observe that the MAE decreases to 9.6 and the RMSE reduces
+to 11.0.
+
+```python
+fcst_finetune_mae_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']],
+ h=24*2,
+ finetune_steps = 30,
+ finetune_loss = 'mae',
+ level=[90, 95])
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon, this may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+```python
+evaluation = evaluate(
+ fcst_finetune_mae_df.merge(df_test, on=['unique_id', 'ds']),
+ metrics=metrics,
+ models=['TimeGPT']
+)
+evaluation
+```
+
+| | unique_id | metric | TimeGPT |
+|-----|-----------|--------|-----------|
+| 0 | DE | mae | 9.640649 |
+| 1 | DE | rmse | 10.956003 |
+
+```python
+plot_series(df_sub[-200:], forecasts_df= fcst_finetune_mae_df, level = [90])
+```
+
+
+
+### 3c. Adjust the number of parameters being fine-tuned
+
+Using the `finetune_depth` parameter, we can control the number of
+parameters that get fine-tuned. By default, `finetune_depth=1`, meaning
+that few parameters are tuned. We can set it to any value from 1 to 5,
+where 5 means that we fine-tune all of the parameters of the model.
+
+```python
+fcst_finetune_depth_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']],
+ h=24*2,
+ finetune_steps = 30,
+ finetune_depth=2,
+ finetune_loss = 'mae',
+ level=[90, 95])
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon, this may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+```python
+evaluation = evaluate(
+ fcst_finetune_depth_df.merge(df_test, on=['unique_id', 'ds']),
+ metrics=metrics,
+ models=['TimeGPT']
+)
+evaluation
+```
+
+| | unique_id | metric | TimeGPT |
+|-----|-----------|--------|-----------|
+| 0 | DE | mae | 9.002193 |
+| 1 | DE | rmse | 11.348207 |
+
+```python
+plot_series(df_sub[-200:], forecasts_df= fcst_finetune_depth_df, level = [90])
+```
+
+
+
+### 3d. Forecast with Exogenous Variables
+
+Exogenous variables are external factors or predictors that are not part
+of the target time series but can influence its behavior. Incorporating
+these variables can provide the model with additional context, improving
+its ability to understand complex relationships and patterns in the
+data.
+
+To use exogenous variables in TimeGPT, pair each point in your input
+time series with the corresponding external data. If you have future
+values available for these variables during the forecast period, include
+them using the X_df parameter. Otherwise, you can omit this parameter
+and still see improvements using only historical values. In the example
+below, we incorporate 8 historical exogenous variables along with their
+values during the test period, which reduces the MAE and RMSE to 4.6 and
+6.4, respectively.
+
+```python
+df_train.head()
+```
+
+| | unique_id | ds | y | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 |
+|----|----|----|----|----|----|----|----|----|----|----|----|----|
+| 1680 | DE | 2017-10-22 00:00:00 | 19.10 | 16972.75 | 15778.92975 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
+| 1681 | DE | 2017-10-22 01:00:00 | 19.03 | 16254.50 | 16664.20950 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
+| 1682 | DE | 2017-10-22 02:00:00 | 16.90 | 15940.25 | 17728.74950 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
+| 1683 | DE | 2017-10-22 03:00:00 | 12.98 | 15959.50 | 18578.13850 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
+| 1684 | DE | 2017-10-22 04:00:00 | 9.24 | 16071.50 | 19389.16750 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
+
+```python
+future_ex_vars_df = df_test.drop(columns = ['y'])
+future_ex_vars_df.head()
+```
+
+| | unique_id | ds | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 |
+|----|----|----|----|----|----|----|----|----|----|----|----|
+| 3312 | DE | 2017-12-29 00:00:00 | 17347.00 | 24577.92650 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
+| 3313 | DE | 2017-12-29 01:00:00 | 16587.25 | 24554.31950 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
+| 3314 | DE | 2017-12-29 02:00:00 | 16396.00 | 24651.45475 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
+| 3315 | DE | 2017-12-29 03:00:00 | 16481.25 | 24666.04300 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
+| 3316 | DE | 2017-12-29 04:00:00 | 16827.75 | 24403.33350 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
+
+```python
+fcst_ex_vars_df = nixtla_client.forecast(df=df_train,
+ X_df=future_ex_vars_df,
+ h=24*2,
+ level=[90, 95])
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon, this may lead to less accurate forecasts. Please consider using a smaller horizon.
+INFO:nixtla.nixtla_client:Using future exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+```python
+evaluation = evaluate(
+ fcst_ex_vars_df.merge(df_test, on=['unique_id', 'ds']),
+ metrics=metrics,
+ models=['TimeGPT']
+)
+evaluation
+```
+
+| | unique_id | metric | TimeGPT |
+|-----|-----------|--------|----------|
+| 0 | DE | mae | 4.602594 |
+| 1 | DE | rmse | 6.358831 |
+
+```python
+plot_series(df_sub[-200:], forecasts_df= fcst_ex_vars_df, level = [90])
+```
+
+
+
+### 3d. TimeGPT for Long Horizon Forecasting
+
+When the forecasting period is too long, the predicted results may not
+be as accurate. TimeGPT performs best with forecast periods that are
+shorter than one complete cycle of the time series. For longer forecast
+periods, switching to the timegpt-1-long-horizon model can yield better
+results. You can specify this model by using the model parameter.
+
+In the electricity price time series used here, one cycle is 24 steps
+(representing one day). Since we’re forecasting two days (48 steps) into
+the future, using timegpt-1-long-horizon significantly improves the
+forecasting accuracy, reducing the MAE to 6.4 and RMSE to 7.7.
+
+```python
+fcst_long_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']],
+ h=24*2,
+ model = 'timegpt-1-long-horizon',
+ level=[90, 95])
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Inferred freq: h
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Querying model metadata...
+INFO:nixtla.nixtla_client:Restricting input...
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+```python
+evaluation = evaluate(
+ fcst_long_df.merge(df_test, on=['unique_id', 'ds']),
+ metrics=metrics,
+ models=['TimeGPT']
+)
+evaluation
+```
+
+| | unique_id | metric | TimeGPT |
+|-----|-----------|--------|----------|
+| 0 | DE | mae | 6.365540 |
+| 1 | DE | rmse | 7.738188 |
+
+```python
+plot_series(df_sub[-200:], forecasts_df= fcst_long_df, level = [90])
+```
+
+
+
+## 4. Conclusion and Next Steps
+
+In this notebook, we demonstrated four effective strategies for
+enhancing forecast accuracy with TimeGPT:
+
+1. **Increasing the number of fine-tuning steps.**
+2. **Adjusting the fine-tuning loss function.**
+3. **Incorporating exogenous variables.**
+4. **Switching to the long-horizon model for extended forecasting
+ periods.**
+
+We encourage you to experiment with these hyperparameters to identify
+the optimal settings that best suit your specific needs. Additionally,
+please refer to our documentation for further features, such as **model
+explainability** and more.
+
+In the examples provided, after applying these methods, we observed
+significant improvements in forecast accuracy metrics, as summarized
+below.
+
+### Result Summary
+
+| Steps | Description | MAE | MAE Improvement (%) | RMSE | RMSE Improvement (%) |
+|------|----------------------|------|----------------|------|-----------------|
+| 0 | Zero-Shot TimeGPT | 18.5 | N/A | 20.0 | N/A |
+| 1 | Add Fine-Tuning Steps | 11.5 | 38% | 12.6 | 37% |
+| 2 | Adjust Fine-Tuning Loss | 9.6 | 48% | 11.0 | 45% |
+| 3 | Fine-tune more parameters | 9.0 | 51% | 11.3 | 44% |
+| 4 | Add Exogenous Variables | 4.6 | 75% | 6.4 | 68% |
+| 5 | Switch to Long-Horizon Model | 6.4 | 65% | 7.7 | 62% |
+
diff --git a/nixtla/docs/tutorials/longhorizon.html.mdx b/nixtla/docs/tutorials/longhorizon.html.mdx
new file mode 100644
index 00000000..4bc41453
--- /dev/null
+++ b/nixtla/docs/tutorials/longhorizon.html.mdx
@@ -0,0 +1,163 @@
+---
+output-file: longhorizon.html
+title: Long-horizon forecasting
+---
+
+
+Long-horizon forecasting refers to predictions far into the future,
+typically exceeding two seasonal periods. However, the exact definition
+of a ‘long horizon’ can vary based on the frequency of the data. For
+example, when dealing with hourly data, a forecast for three days into
+the future is considered long-horizon, as it covers 72 timestamps
+(calculated as 3 days × 24 hours/day). In the context of monthly data, a
+period exceeding two years would typically be classified as long-horizon
+forecasting. Similarly, for daily data, a forecast spanning more than
+two weeks falls into the long-horizon category.
+
+Of course, forecasting over a long horizon comes with its challenges.
+The longer the forecast horizon, the greater the uncertainty in the
+predictions. It is also possible to have unknown factors come into play
+in the long-term that were not expected at the time of forecasting.
+
+To tackle those challenges, use TimeGPT’s specialized model for
+long-horizon forecasting by specifying `model='timegpt-1-long-horizon'`
+in your setup.
+
+For a detailed step-by-step guide, follow this tutorial on long-horizon
+forecasting.
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/04_longhorizon.ipynb)
+
+## 1. Import packages
+
+First, we install and import the required packages and initialize the
+Nixtla client.
+
+```python
+from nixtla import NixtlaClient
+from datasetsforecast.long_horizon import LongHorizon
+from utilsforecast.losses import mae
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, remember to set also the `base_url`
+> argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load the data
+
+Let’s load the ETTh1 dataset. This is a widely used dataset to evaluate
+models on their long-horizon forecasting capabalities.
+
+The ETTh1 dataset monitors an electricity transformer from a region of a
+province of China including oil temperature and variants of load (such
+as high useful load and high useless load) from July 2016 to July 2018
+at an hourly frequency.
+
+For this tutorial, let’s only consider the oil temperature variation
+over time.
+
+```python
+Y_df, *_ = LongHorizon.load(directory='./', group='ETTh1')
+
+Y_df.head()
+```
+
+``` text
+100%|██████████| 314M/314M [00:14<00:00, 21.3MiB/s]
+INFO:datasetsforecast.utils:Successfully downloaded datasets.zip, 314116557, bytes.
+INFO:datasetsforecast.utils:Decompressing zip file...
+INFO:datasetsforecast.utils:Successfully decompressed longhorizon\datasets\datasets.zip
+```
+
+| | unique_id | ds | y |
+|-----|-----------|---------------------|----------|
+| 0 | OT | 2016-07-01 00:00:00 | 1.460552 |
+| 1 | OT | 2016-07-01 01:00:00 | 1.161527 |
+| 2 | OT | 2016-07-01 02:00:00 | 1.161527 |
+| 3 | OT | 2016-07-01 03:00:00 | 0.862611 |
+| 4 | OT | 2016-07-01 04:00:00 | 0.525227 |
+
+For this small experiment, let’s set the horizon to 96 time steps (4
+days into the future), and we will feed TimeGPT with a sequence of 42
+days.
+
+```python
+test = Y_df[-96:] # 96 = 4 days x 24h/day
+input_seq = Y_df[-1104:-96] # Gets a sequence of 1008 observations (1008 = 42 days * 24h/day)
+```
+
+## 3. Forecasting for long-horizon
+
+Now, we are ready to use TimeGPT for long-horizon forecasting. Here, we
+need to set the `model` parameter to `"timegpt-1-long-horizon"`. This is
+the specialized model in TimeGPT that can handle such tasks.
+
+```python
+fcst_df = nixtla_client.forecast(
+ df=input_seq,
+ h=96,
+ level=[90],
+ finetune_steps=10,
+ finetune_loss='mae',
+ model='timegpt-1-long-horizon',
+ time_col='ds',
+ target_col='y'
+)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Inferred freq: H
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+
+```python
+nixtla_client.plot(Y_df[-168:], fcst_df, models=['TimeGPT'], level=[90], time_col='ds', target_col='y')
+```
+
+
+
+## Evaluation
+
+Let’s now evaluate the performance of TimeGPT using the mean absolute
+error (MAE).
+
+```python
+test = test.copy()
+
+test.loc[:, 'TimeGPT'] = fcst_df['TimeGPT'].values
+```
+
+
+```python
+evaluation = mae(test, models=['TimeGPT'], id_col='unique_id', target_col='y')
+
+print(evaluation)
+```
+
+``` text
+ unique_id TimeGPT
+0 OT 0.145393
+```
+
+Here, TimeGPT achieves a MAE of 0.146.
+
diff --git a/nixtla/docs/tutorials/loss_function_finetuning.html.mdx b/nixtla/docs/tutorials/loss_function_finetuning.html.mdx
new file mode 100644
index 00000000..81a8b0dd
--- /dev/null
+++ b/nixtla/docs/tutorials/loss_function_finetuning.html.mdx
@@ -0,0 +1,295 @@
+---
+output-file: loss_function_finetuning.html
+title: Fine-tuning with a specific loss function
+---
+
+
+When fine-tuning, the model trains on your dataset to tailor its
+predictions to your particular scenario. As such, it is possible to
+specify the loss function used during fine-tuning.
+
+Specifically, you can choose from:
+
+- `"default"` - a proprietary loss function that is robust to outliers
+- `"mae"` - mean absolute error
+- `"mse"` - mean squared error
+- `"rmse"` - root mean squared error
+- `"mape"` - mean absolute percentage error
+- `"smape"` - symmetric mean absolute percentage error
+
+[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/07_loss_function_finetuning.ipynb)
+
+## 1. Import packages
+
+First, we import the required packages and initialize the Nixtla client.
+
+```python
+import pandas as pd
+from nixtla import NixtlaClient
+from utilsforecast.losses import mae, mse, rmse, mape, smape
+```
+
+
+```python
+nixtla_client = NixtlaClient(
+ # defaults to os.environ.get("NIXTLA_API_KEY")
+ api_key = 'my_api_key_provided_by_nixtla'
+)
+```
+
+> 👍 Use an Azure AI endpoint
+>
+> To use an Azure AI endpoint, remember to set also the `base_url`
+> argument:
+>
+> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`
+
+## 2. Load data
+
+Let’s fine-tune the model on a dataset using the mean absolute error
+(MAE).
+
+For that, we simply pass the appropriate string representing the loss
+function to the `finetune_loss` parameter of the `forecast` method.
+
+```python
+df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
+df.insert(loc=0, column='unique_id', value=1)
+
+df.head()
+```
+
+| | unique_id | timestamp | value |
+|-----|-----------|------------|-------|
+| 0 | 1 | 1949-01-01 | 112 |
+| 1 | 1 | 1949-02-01 | 118 |
+| 2 | 1 | 1949-03-01 | 132 |
+| 3 | 1 | 1949-04-01 | 129 |
+| 4 | 1 | 1949-05-01 | 121 |
+
+## 3. Fine-tuning with Mean Absolute Error
+
+Let’s fine-tune the model on a dataset using the Mean Absolute Error
+(MAE).
+
+For that, we simply pass the appropriate string representing the loss
+function to the `finetune_loss` parameter of the `forecast` method.
+
+```python
+timegpt_fcst_finetune_mae_df = nixtla_client.forecast(
+ df=df,
+ h=12,
+ finetune_steps=10,
+ finetune_loss='mae', # Set your desired loss function
+ time_col='timestamp',
+ target_col='value',
+)
+```
+
+``` text
+INFO:nixtla.nixtla_client:Validating inputs...
+INFO:nixtla.nixtla_client:Preprocessing dataframes...
+INFO:nixtla.nixtla_client:Inferred freq: MS
+INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
+```
+
+> 📘 Available models in Azure AI
+>
+> If you are using an Azure AI endpoint, please be sure to set
+> `model="azureai"`:
+>
+> `nixtla_client.forecast(..., model="azureai")`
+>
+> For the public API, we support two models: `timegpt-1` and
+> `timegpt-1-long-horizon`.
+>
+> By default, `timegpt-1` is used. Please see [this
+> tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting)
+> on how and when to use `timegpt-1-long-horizon`.
+
+```python
+nixtla_client.plot(
+ df, timegpt_fcst_finetune_mae_df,
+ time_col='timestamp', target_col='value',
+)
+```
+
+
+
+Now, depending on your data, you will use a specific error metric to
+accurately evaluate your forecasting model’s performance.
+
+Below is a non-exhaustive guide on which metric to use depending on your
+use case.
+
+**Mean absolute error (MAE)**
+
+
+
+
+
+
+
+
+
diff --git a/statsforecast/docs/experiments/autoarima_vs_prophet.html.mdx b/statsforecast/docs/experiments/autoarima_vs_prophet.html.mdx
new file mode 100644
index 00000000..ba49fbbf
--- /dev/null
+++ b/statsforecast/docs/experiments/autoarima_vs_prophet.html.mdx
@@ -0,0 +1,655 @@
+---
+output-file: autoarima_vs_prophet.html
+title: AutoARIMA Comparison (Prophet and pmdarima)
+---
+
+
+
+
+source + +### MFLES + +> ``` text +> MFLES (verbose=1, robust=None) +> ``` + +*Initialize self. See help(type(self)) for accurate signature.* + diff --git a/statsforecast/src/mstl.html.mdx b/statsforecast/src/mstl.html.mdx new file mode 100644 index 00000000..095623c8 --- /dev/null +++ b/statsforecast/src/mstl.html.mdx @@ -0,0 +1,29 @@ +--- +output-file: mstl.html +title: MSTL model +--- + + +------------------------------------------------------------------------ + +source + +### mstl + +> ``` text +> mstl (x:numpy.ndarray, period:Union[int,List[int]], +> blambda:Optional[float]=None, iterate:int=2, +> s_window:Optional[numpy.ndarray]=None, stl_kwargs:Dict={}) +> ``` + +| | **Type** | **Default** | **Details** | +|------------|----------|-------------|----------------------| +| x | ndarray | | time series | +| period | Union | | season length | +| blambda | Optional | None | box-cox transform | +| iterate | int | 2 | number of iterations | +| s_window | Optional | None | seasonal window | +| stl_kwargs | Dict | {} | | + diff --git a/statsforecast/src/tbats.html.mdx b/statsforecast/src/tbats.html.mdx new file mode 100644 index 00000000..13b5bea2 --- /dev/null +++ b/statsforecast/src/tbats.html.mdx @@ -0,0 +1,158 @@ +--- +output-file: tbats.html +title: TBATS model +--- + + + +```python +import matplotlib.pyplot as plt +import pandas as pd + +from statsforecast.utils import AirPassengers as ap +``` + +## Load data + +## Functions + +### find_harmonics + +### initial_parameters + +### makeXMatrix + +### findPQ + +### makeTBATSWMatrix + +### makeTBATSGMatrix + +### makeTBATSFMatrix + +### calcTBATSFaster + +### extract_params + +### updateTBATSWMatrix + +### updateTBATSGMatrix + +### updateTBATSFMatrix + +### checkAdmissibility + +### calcLikelihoodTBATS + +## TBATS model + +### tbats_model_generator + +### tbats_model + +------------------------------------------------------------------------ + +source + +### tbats_model + +> ``` text +> tbats_model (y, seasonal_periods, k_vector, use_boxcox, bc_lower_bound, +> bc_upper_bound, use_trend, use_damped_trend, +> use_arma_errors) +> ``` + +### tbats_selection + +------------------------------------------------------------------------ + +source + +### tbats_selection + +> ``` text +> tbats_selection (y, seasonal_periods, use_boxcox, bc_lower_bound, +> bc_upper_bound, use_trend, use_damped_trend, +> use_arma_errors) +> ``` + +### tbats_forecast + +------------------------------------------------------------------------ + +source + +### tbats_forecast + +> ``` text +> tbats_forecast (mod, h) +> ``` + +| | **Details** | +|-----|--------------------------------------------| +| mod | | +| h | this function is the same as bats_forecast | + +### Example + + +```python +y = ap +seasonal_periods = np.array([12]) +``` + + +```python +# Default parameters +use_boxcox = None +bc_lower_bound = 0 +bc_upper_bound = 1 +use_trend = None +use_damped_trend = None +use_arma_errors = True +``` + + +```python +mod = tbats_selection(y, seasonal_periods, use_boxcox, bc_lower_bound, bc_upper_bound, use_trend, use_damped_trend, use_arma_errors) +``` + + +```python +# Values in R +print(mod['aic']) # 1397.015 +print(mod['k_vector']) # 5 +print(mod['description']) # use_boxcox = TRUE, use_trend = TRUE, use_damped_trend = FALSE, use_arma_errors = FALSE +``` + + +```python +fitted_trans = mod['fitted'].ravel() +if mod['BoxCox_lambda'] is not None: + fitted_trans = inv_boxcox(fitted_trans, mod['BoxCox_lambda']) +``` + + +```python +h = 24 +fcst = tbats_forecast(mod, h) +forecast = fcst['mean'] +if mod['BoxCox_lambda'] is not None: + forecast = inv_boxcox(forecast, mod['BoxCox_lambda']) +``` + + +```python +fig, ax = plt.subplots(1, 1, figsize = (20,7)) +plt.plot(np.arange(0, len(y)), y, color='black', label='original') +plt.plot(np.arange(0, len(y)), fitted_trans, color='blue', label = "fitted") +plt.plot(np.arange(len(y), len(y)+h), forecast, '.-', color = 'green', label = 'fcst') +plt.legend() +``` + diff --git a/statsforecast/src/theta.html.mdx b/statsforecast/src/theta.html.mdx new file mode 100644 index 00000000..3b6adfb6 --- /dev/null +++ b/statsforecast/src/theta.html.mdx @@ -0,0 +1,48 @@ +--- +output-file: theta.html +title: Theta Model +--- + + +------------------------------------------------------------------------ + +source + +### forecast_theta + +> ``` text +> forecast_theta (obj, h, level=None) +> ``` + + +```python +forecast_theta(res, 12, level=[90, 80]) +``` + +------------------------------------------------------------------------ + +source + +### auto_theta + +> ``` text +> auto_theta (y, m, model=None, initial_smoothed=None, alpha=None, +> theta=None, nmse=3, decomposition_type='multiplicative') +> ``` + +------------------------------------------------------------------------ + +source + +### forward_theta + +> ``` text +> forward_theta (fitted_model, y) +> ``` + diff --git a/statsforecast/src/utils.html.mdx b/statsforecast/src/utils.html.mdx new file mode 100644 index 00000000..11ba2017 --- /dev/null +++ b/statsforecast/src/utils.html.mdx @@ -0,0 +1,89 @@ +--- +description: >- + The `core.StatsForecast` class allows you to efficiently fit multiple + `StatsForecast` models for large sets of time series. It operates with pandas + DataFrame `df` that identifies individual series and datestamps with the + `unique_id` and `ds` columns, and the `y` column denotes the target time + series variable. To assist development, we declare useful datasets that we use + throughout all `StatsForecast`'s unit tests. +output-file: utils.html +title: Utils +--- + + +# 1. Synthetic Panel Data + +------------------------------------------------------------------------ + +source + +### generate_series + +> ``` text +> generate_series (n_series:int, freq:str='D', min_length:int=50, +> max_length:int=500, n_static_features:int=0, +> equal_ends:bool=False, engine:str='pandas', seed:int=0) +> ``` + +\*Generate Synthetic Panel Series. + +Generates `n_series` of frequency `freq` of different lengths in the +interval \[`min_length`, `max_length`\]. If `n_static_features > 0`, +then each series gets static features with random values. If +`equal_ends == True` then all series end at the same date.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| n_series | int | | Number of series for synthetic panel. | +| freq | str | D | Frequency of the data, ‘D’ or ‘M’. | +| min_length | int | 50 | Minimum length of synthetic panel’s series. | +| max_length | int | 500 | Maximum length of synthetic panel’s series. | +| n_static_features | int | 0 | Number of static exogenous variables for synthetic panel’s series. | +| equal_ends | bool | False | Series should end in the same date stamp `ds`. | +| engine | str | pandas | Output Dataframe type (‘pandas’ or ‘polars’). | +| seed | int | 0 | Random seed used for generating the data. | +| **Returns** | **Union** | | **Synthetic panel with columns \[`unique_id`, `ds`, `y`\] and exogenous.** | + + +```python +synthetic_panel = generate_series(n_series=2) +synthetic_panel.groupby('unique_id', observed=True).head(4) +``` + +# 2. AirPassengers Data + +The classic Box & Jenkins airline data. Monthly totals of international +airline passengers, 1949 to 1960. + +It has been used as a reference on several forecasting libraries, since +it is a series that shows clear trends and seasonalities it offers a +nice opportunity to quickly showcase a model’s predictions performance. + + +```python +from statsforecast.utils import AirPassengersDF +``` + + +```python +AirPassengersDF.head(12) +``` + + +```python +#We are going to plot the ARIMA predictions, and the prediction intervals. +fig, ax = plt.subplots(1, 1, figsize = (20, 7)) +plot_df = AirPassengersDF.set_index('ds') + +plot_df[['y']].plot(ax=ax, linewidth=2) +ax.set_title('AirPassengers Forecast', fontsize=22) +ax.set_ylabel('Monthly Passengers', fontsize=20) +ax.set_xlabel('Timestamp [t]', fontsize=20) +ax.legend(prop={'size': 15}) +ax.grid() +``` + +## Model utils + diff --git a/style.css b/style.css new file mode 100644 index 00000000..03565270 --- /dev/null +++ b/style.css @@ -0,0 +1,144 @@ +@font-face { + font-family: "PPNeueMontreal"; + src: url("./fonts/ppneuemontreal-medium.otf") format("otf"); +} + +@font-face { + font-family: "SupplyMono"; + src: url("./fonts/Supply-Regular.otf") format("otf"); +} + +:root { + --primary-light: #fff; + --primary-dark: #161616; + --gray: #f0f0f0; +} + +html, +body { + background-color: var(--gray); + font-family: "PPNeueMontreal", sans-serif; +} + +.eyebrow { + font-family: "SupplyMono", monospace; + @apply text-red-300; +} + +#navbar img { + height: 20px; +} + +.bg-gradient-to-b { + background-color: transparent; + background-image: none; +} + +a { + border-radius: 0.125rem !important; + border: 1px solid transparent; +} + +a.font-semibold { + background: var(--primary-light); + border: 1px solid var(--primary-dark); + font-weight: 500; + color: var(--primary-dark); +} + +.rounded-md { + border-radius: 0.125rem !important; +} + +.rounded-xl { + border-radius: 0.125rem !important; +} + +.rounded-2xl, +.rounded-search { + border-radius: 0.25rem !important; +} + +#navbar-transition { + background: var(--gray); +} + +#topbar-cta-button a span { + background: var(--primary-dark); + border-radius: 0.125rem; +} + +#content-side-layout a { + border: none; +} + +#content-side-layout a.font-medium { + color: var(--primary); + font-weight: 600; +} + +a.card svg { + background: var(--primary-dark); + opacity: 0.8; +} + +/* dark mode */ +html.dark > body { + background-color: #161616; +} + +html.dark #navbar-transition { + background: var(--primary-dark); +} + +html.dark a.font-semibold { + background: #000; + border: 1px solid var(--gray); + outline-color: var(--gray); + color: #fff; +} + +html.dark a.font-semibold svg { + background: #fff; +} + +html.dark #topbar-cta-button a span { + background: #fff; + color: #000; +} + +html.dark #topbar-cta-button svg { + color: #000; +} + +html.dark a.card svg { + background: var(--primary-light); +} + +/* Banner styling for theme support */ +#banner { + /* background-color: var(--primary-light); # original */ + background-color: #22c55e; + border-color: var(--primary-light); + color: var(--primary-dark); + border-bottom: 1px solid var(--primary-dark) !important; +} + +#banner a:hover { + opacity: 0.8; +} + +#banner p span { + color: black !important; + margin: 0; + font-size: medium; +} + +#banner strong { + color: black !important; + font-size: medium; +} + +#banner svg path { + color: #000 !important; +} diff --git a/utilsforecast/.nojekyll b/utilsforecast/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/utilsforecast/compat.mdx b/utilsforecast/compat.mdx new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/utilsforecast/compat.mdx @@ -0,0 +1 @@ + diff --git a/utilsforecast/dark.png b/utilsforecast/dark.png new file mode 100644 index 00000000..4142a0bb Binary files /dev/null and b/utilsforecast/dark.png differ diff --git a/utilsforecast/data.html.mdx b/utilsforecast/data.html.mdx new file mode 100644 index 00000000..e77e014e --- /dev/null +++ b/utilsforecast/data.html.mdx @@ -0,0 +1,59 @@ +--- +description: Utilies for generating time series datasets +output-file: data.html +title: Data +--- + + +------------------------------------------------------------------------ + +source + +### generate_series + +> ``` text +> generate_series (n_series:int, freq:str='D', min_length:int=50, +> max_length:int=500, n_static_features:int=0, +> equal_ends:bool=False, with_trend:bool=False, +> static_as_categorical:bool=True, n_models:int=0, +> level:Optional[List[float]]=None, +> engine:Literal['pandas','polars']='pandas', seed:int=0) +> ``` + +*Generate Synthetic Panel Series.* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| n_series | int | | Number of series for synthetic panel. | +| freq | str | D | Frequency of the data (pandas alias).
Seasonalities are implemented for hourly, daily and monthly. | +| min_length | int | 50 | Minimum length of synthetic panel’s series. | +| max_length | int | 500 | Maximum length of synthetic panel’s series. | +| n_static_features | int | 0 | Number of static exogenous variables for synthetic panel’s series. | +| equal_ends | bool | False | Series should end in the same timestamp. | +| with_trend | bool | False | Series should have a (positive) trend. | +| static_as_categorical | bool | True | Static features should have a categorical data type. | +| n_models | int | 0 | Number of models predictions to simulate. | +| level | Optional | None | Confidence level for intervals to simulate for each model. | +| engine | Literal | pandas | Output Dataframe type. | +| seed | int | 0 | Random seed used for generating the data. | +| **Returns** | **Union** | | **Synthetic panel with columns \[`unique_id`, `ds`, `y`\] and exogenous features.** | + + +```python +synthetic_panel = generate_series(n_series=2) +synthetic_panel.groupby('unique_id', observed=True).head(4) +``` + +| | unique_id | ds | y | +|-----|-----------|------------|----------| +| 0 | 0 | 2000-01-01 | 0.357595 | +| 1 | 0 | 2000-01-02 | 1.301382 | +| 2 | 0 | 2000-01-03 | 2.272442 | +| 3 | 0 | 2000-01-04 | 3.211827 | +| 222 | 1 | 2000-01-01 | 5.399023 | +| 223 | 1 | 2000-01-02 | 6.092818 | +| 224 | 1 | 2000-01-03 | 0.476396 | +| 225 | 1 | 2000-01-04 | 1.343744 | + diff --git a/utilsforecast/evaluation.html.mdx b/utilsforecast/evaluation.html.mdx new file mode 100644 index 00000000..4815bb9a --- /dev/null +++ b/utilsforecast/evaluation.html.mdx @@ -0,0 +1,131 @@ +--- +description: Model performance evaluation +output-file: evaluation.html +title: Evaluation +--- + + +------------------------------------------------------------------------ + +source + +### evaluate + +> ``` text +> evaluate (df:~AnyDFType, metrics:List[Callable], +> models:Optional[List[str]]=None, +> train_df:Optional[~AnyDFType]=None, +> level:Optional[List[int]]=None, id_col:str='unique_id', +> time_col:str='ds', target_col:str='y', +> agg_fn:Optional[str]=None) +> ``` + +*Evaluate forecast using different metrics.* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | AnyDFType | | Forecasts to evaluate.
Must have `id_col`, `time_col`, `target_col` and models’ predictions. | +| metrics | List | | Functions with arguments `df`, `models`, `id_col`, `target_col` and optionally `train_df`. | +| models | Optional | None | Names of the models to evaluate.
If `None` will use every column in the dataframe after removing id, time and target. | +| train_df | Optional | None | Training set. Used to evaluate metrics such as [`mase`](https://Nixtla.github.io/utilsforecast/losses.html#mase). | +| level | Optional | None | Prediction interval levels. Used to compute losses that rely on quantiles. | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | +| target_col | str | y | Column that contains the target. | +| agg_fn | Optional | None | Statistic to compute on the scores by id to reduce them to a single number. | +| **Returns** | **AnyDFType** | | **Metrics with one row per (id, metric) combination and one column per model.
If `agg_fn` is not `None`, there is only one row per metric.** | + + +```python +from functools import partial + +import numpy as np +import pandas as pd + +from utilsforecast.losses import * +from utilsforecast.data import generate_series +``` + + +```python +series = generate_series(10, n_models=2, level=[80, 95]) +``` + + +```python +series['unique_id'] = series['unique_id'].astype('int') +``` + + +```python +models = ['model0', 'model1'] +metrics = [ + mae, + mse, + rmse, + mape, + smape, + partial(mase, seasonality=7), + quantile_loss, + mqloss, + coverage, + calibration, + scaled_crps, +] +``` + + +```python +evaluation = evaluate( + series, + metrics=metrics, + models=models, + train_df=series, + level=[80, 95], +) +evaluation +``` + +| | unique_id | metric | model0 | model1 | +|-----|-----------|-------------|----------|----------| +| 0 | 0 | mae | 0.158108 | 0.163246 | +| 1 | 1 | mae | 0.160109 | 0.143805 | +| 2 | 2 | mae | 0.159815 | 0.170510 | +| 3 | 3 | mae | 0.168537 | 0.161595 | +| 4 | 4 | mae | 0.170182 | 0.163329 | +| ... | ... | ... | ... | ... | +| 175 | 5 | scaled_crps | 0.034202 | 0.035472 | +| 176 | 6 | scaled_crps | 0.034880 | 0.033610 | +| 177 | 7 | scaled_crps | 0.034337 | 0.034745 | +| 178 | 8 | scaled_crps | 0.033336 | 0.032459 | +| 179 | 9 | scaled_crps | 0.034766 | 0.035243 | + + +```python +summary = evaluation.drop(columns='unique_id').groupby('metric').mean().reset_index() +summary +``` + +| | metric | model0 | model1 | +|-----|----------------------|----------|----------| +| 0 | calibration_q0.025 | 0.000000 | 0.000000 | +| 1 | calibration_q0.1 | 0.000000 | 0.000000 | +| 2 | calibration_q0.9 | 0.833993 | 0.815833 | +| 3 | calibration_q0.975 | 0.853991 | 0.836949 | +| 4 | coverage_level80 | 0.833993 | 0.815833 | +| 5 | coverage_level95 | 0.853991 | 0.836949 | +| 6 | mae | 0.161286 | 0.162281 | +| 7 | mape | 0.048894 | 0.049624 | +| 8 | mase | 0.966846 | 0.975354 | +| 9 | mqloss | 0.056904 | 0.056216 | +| 10 | mse | 0.048653 | 0.049198 | +| 11 | quantile_loss_q0.025 | 0.019990 | 0.019474 | +| 12 | quantile_loss_q0.1 | 0.067315 | 0.065781 | +| 13 | quantile_loss_q0.9 | 0.095510 | 0.093841 | +| 14 | quantile_loss_q0.975 | 0.044803 | 0.045767 | +| 15 | rmse | 0.220357 | 0.221543 | +| 16 | scaled_crps | 0.035003 | 0.034576 | +| 17 | smape | 0.024475 | 0.024902 | + diff --git a/utilsforecast/favicon.svg b/utilsforecast/favicon.svg new file mode 100644 index 00000000..e5f33342 --- /dev/null +++ b/utilsforecast/favicon.svg @@ -0,0 +1,5 @@ + diff --git a/utilsforecast/feature_engineering.html.mdx b/utilsforecast/feature_engineering.html.mdx new file mode 100644 index 00000000..9334ee75 --- /dev/null +++ b/utilsforecast/feature_engineering.html.mdx @@ -0,0 +1,374 @@ +--- +description: Create exogenous regressors for your models +output-file: feature_engineering.html +title: Feature engineering +--- + + +------------------------------------------------------------------------ + +source + +### fourier + +> ``` text +> fourier (df:~DFType, freq:Union[str,int], season_length:int, k:int, +> h:int=0, id_col:str='unique_id', time_col:str='ds') +> ``` + +*Compute fourier seasonal terms for training and forecasting* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Dataframe with ids, times and values for the exogenous regressors. | +| freq | Union | | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | +| season_length | int | | Number of observations per unit of time. Ex: 24 Hourly data. | +| k | int | | Maximum order of the fourier terms | +| h | int | 0 | Forecast horizon. | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | +| **Returns** | **Tuple** | | **Original DataFrame with the computed features** | + + +```python +import pandas as pd + +from utilsforecast.data import generate_series +``` + + +```python +series = generate_series(5, equal_ends=True) +transformed_df, future_df = fourier(series, freq='D', season_length=7, k=2, h=1) +transformed_df +``` + +| | unique_id | ds | y | sin1_7 | sin2_7 | cos1_7 | cos2_7 | +|------|-----------|------------|----------|-----------|-----------|-----------|-----------| +| 0 | 0 | 2000-10-05 | 0.428973 | -0.974927 | 0.433894 | -0.222526 | -0.900964 | +| 1 | 0 | 2000-10-06 | 1.423626 | -0.781835 | -0.974926 | 0.623486 | -0.222531 | +| 2 | 0 | 2000-10-07 | 2.311782 | -0.000005 | -0.000009 | 1.000000 | 1.000000 | +| 3 | 0 | 2000-10-08 | 3.192191 | 0.781829 | 0.974930 | 0.623493 | -0.222512 | +| 4 | 0 | 2000-10-09 | 4.148767 | 0.974929 | -0.433877 | -0.222517 | -0.900972 | +| ... | ... | ... | ... | ... | ... | ... | ... | +| 1096 | 4 | 2001-05-10 | 4.058910 | -0.974927 | 0.433888 | -0.222523 | -0.900967 | +| 1097 | 4 | 2001-05-11 | 5.178157 | -0.781823 | -0.974934 | 0.623500 | -0.222495 | +| 1098 | 4 | 2001-05-12 | 6.133142 | -0.000002 | -0.000003 | 1.000000 | 1.000000 | +| 1099 | 4 | 2001-05-13 | 0.403709 | 0.781840 | 0.974922 | 0.623479 | -0.222548 | +| 1100 | 4 | 2001-05-14 | 1.081779 | 0.974928 | -0.433882 | -0.222520 | -0.900970 | + + +```python +future_df +``` + +| | unique_id | ds | sin1_7 | sin2_7 | cos1_7 | cos2_7 | +|-----|-----------|------------|----------|-----------|-----------|----------| +| 0 | 0 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 | +| 1 | 1 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 | +| 2 | 2 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 | +| 3 | 3 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 | +| 4 | 4 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 | + +------------------------------------------------------------------------ + +source + +### trend + +> ``` text +> trend (df:~DFType, freq:Union[str,int], h:int=0, id_col:str='unique_id', +> time_col:str='ds') +> ``` + +*Add a trend column with consecutive integers for training and +forecasting* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Dataframe with ids, times and values for the exogenous regressors. | +| freq | Union | | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | +| h | int | 0 | Forecast horizon. | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | +| **Returns** | **Tuple** | | **Original DataFrame with the computed features** | + + +```python +series = generate_series(5, equal_ends=True) +transformed_df, future_df = trend(series, freq='D', h=1) +transformed_df +``` + +| | unique_id | ds | y | trend | +|------|-----------|------------|----------|-------| +| 0 | 0 | 2000-10-05 | 0.428973 | 152.0 | +| 1 | 0 | 2000-10-06 | 1.423626 | 153.0 | +| 2 | 0 | 2000-10-07 | 2.311782 | 154.0 | +| 3 | 0 | 2000-10-08 | 3.192191 | 155.0 | +| 4 | 0 | 2000-10-09 | 4.148767 | 156.0 | +| ... | ... | ... | ... | ... | +| 1096 | 4 | 2001-05-10 | 4.058910 | 369.0 | +| 1097 | 4 | 2001-05-11 | 5.178157 | 370.0 | +| 1098 | 4 | 2001-05-12 | 6.133142 | 371.0 | +| 1099 | 4 | 2001-05-13 | 0.403709 | 372.0 | +| 1100 | 4 | 2001-05-14 | 1.081779 | 373.0 | + + +```python +future_df +``` + +| | unique_id | ds | trend | +|-----|-----------|------------|-------| +| 0 | 0 | 2001-05-15 | 374.0 | +| 1 | 1 | 2001-05-15 | 374.0 | +| 2 | 2 | 2001-05-15 | 374.0 | +| 3 | 3 | 2001-05-15 | 374.0 | +| 4 | 4 | 2001-05-15 | 374.0 | + +------------------------------------------------------------------------ + +source + +### time_features + +> ``` text +> time_features (df:~DFType, freq:Union[str,int], +> features:List[Union[str,Callable]], h:int=0, +> id_col:str='unique_id', time_col:str='ds') +> ``` + +*Compute timestamp-based features for training and forecasting* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Dataframe with ids, times and values for the exogenous regressors. | +| freq | Union | | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | +| features | List | | Features to compute. Can be string aliases of timestamp attributes or functions to apply to the times. | +| h | int | 0 | Forecast horizon. | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | +| **Returns** | **Tuple** | | **Original DataFrame with the computed features** | + + +```python +transformed_df, future_df = time_features(series, freq='D', features=['month', 'day', 'week'], h=1) +transformed_df +``` + +| | unique_id | ds | y | month | day | week | +|------|-----------|------------|----------|-------|-----|------| +| 0 | 0 | 2000-10-05 | 0.428973 | 10 | 5 | 40 | +| 1 | 0 | 2000-10-06 | 1.423626 | 10 | 6 | 40 | +| 2 | 0 | 2000-10-07 | 2.311782 | 10 | 7 | 40 | +| 3 | 0 | 2000-10-08 | 3.192191 | 10 | 8 | 40 | +| 4 | 0 | 2000-10-09 | 4.148767 | 10 | 9 | 41 | +| ... | ... | ... | ... | ... | ... | ... | +| 1096 | 4 | 2001-05-10 | 4.058910 | 5 | 10 | 19 | +| 1097 | 4 | 2001-05-11 | 5.178157 | 5 | 11 | 19 | +| 1098 | 4 | 2001-05-12 | 6.133142 | 5 | 12 | 19 | +| 1099 | 4 | 2001-05-13 | 0.403709 | 5 | 13 | 19 | +| 1100 | 4 | 2001-05-14 | 1.081779 | 5 | 14 | 20 | + + +```python +future_df +``` + +| | unique_id | ds | month | day | week | +|-----|-----------|------------|-------|-----|------| +| 0 | 0 | 2001-05-15 | 5 | 15 | 20 | +| 1 | 1 | 2001-05-15 | 5 | 15 | 20 | +| 2 | 2 | 2001-05-15 | 5 | 15 | 20 | +| 3 | 3 | 2001-05-15 | 5 | 15 | 20 | +| 4 | 4 | 2001-05-15 | 5 | 15 | 20 | + +------------------------------------------------------------------------ + +source + +### future_exog_to_historic + +> ``` text +> future_exog_to_historic (df:~DFType, freq:Union[str,int], +> features:List[str], h:int=0, +> id_col:str='unique_id', time_col:str='ds') +> ``` + +*Turn future exogenous features into historic by shifting them `h` +steps.* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Dataframe with ids, times and values for the exogenous regressors. | +| freq | Union | | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | +| features | List | | Features to be converted into historic. | +| h | int | 0 | Forecast horizon. | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | +| **Returns** | **Tuple** | | **Original DataFrame with the computed features** | + + +```python +series_with_prices = series.assign(price=np.random.rand(len(series))).sample(frac=1.0) +series_with_prices +``` + +| | unique_id | ds | y | price | +|-----|-----------|------------|----------|----------| +| 436 | 2 | 2001-03-26 | 2.369113 | 0.774476 | +| 312 | 1 | 2001-05-08 | 4.405212 | 0.557957 | +| 536 | 3 | 2000-11-04 | 4.362074 | 0.745237 | +| 34 | 0 | 2000-11-08 | 6.111161 | 0.809978 | +| 652 | 3 | 2001-02-28 | 1.448291 | 0.685294 | +| ... | ... | ... | ... | ... | +| 609 | 3 | 2001-01-16 | 0.215892 | 0.699703 | +| 873 | 4 | 2000-09-29 | 5.398198 | 0.677651 | +| 268 | 1 | 2001-03-25 | 2.393771 | 0.735438 | +| 171 | 0 | 2001-03-25 | 3.085493 | 0.463871 | +| 931 | 4 | 2000-11-26 | 0.292296 | 0.691377 | + + +```python +transformed_df, future_df = future_exog_to_historic( + df=series_with_prices, + freq='D', + features=['price'], + h=2, +) +transformed_df +``` + +| | unique_id | ds | y | price | +|------|-----------|------------|----------|----------| +| 0 | 2 | 2001-03-26 | 2.369113 | 0.870133 | +| 1 | 1 | 2001-05-08 | 4.405212 | 0.869751 | +| 2 | 3 | 2000-11-04 | 4.362074 | 0.877901 | +| 3 | 0 | 2000-11-08 | 6.111161 | 0.629413 | +| 4 | 3 | 2001-02-28 | 1.448291 | 0.088073 | +| ... | ... | ... | ... | ... | +| 1096 | 3 | 2001-01-16 | 0.215892 | 0.472261 | +| 1097 | 4 | 2000-09-29 | 5.398198 | 0.887531 | +| 1098 | 1 | 2001-03-25 | 2.393771 | 0.481712 | +| 1099 | 0 | 2001-03-25 | 3.085493 | 0.433153 | +| 1100 | 4 | 2000-11-26 | 0.292296 | 0.620219 | + + +```python +future_df +``` + +| | unique_id | ds | price | +|-----|-----------|------------|----------| +| 0 | 0 | 2001-05-15 | 0.874328 | +| 1 | 0 | 2001-05-16 | 0.481385 | +| 2 | 1 | 2001-05-15 | 0.009058 | +| 3 | 1 | 2001-05-16 | 0.083749 | +| 4 | 2 | 2001-05-15 | 0.726212 | +| 5 | 2 | 2001-05-16 | 0.052221 | +| 6 | 3 | 2001-05-15 | 0.942335 | +| 7 | 3 | 2001-05-16 | 0.274816 | +| 8 | 4 | 2001-05-15 | 0.267545 | +| 9 | 4 | 2001-05-16 | 0.112129 | + +------------------------------------------------------------------------ + +source + +### pipeline + +> ``` text +> pipeline (df:~DFType, features:List[Callable], freq:Union[str,int], +> h:int=0, id_col:str='unique_id', time_col:str='ds') +> ``` + +*Compute several features for training and forecasting* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Dataframe with ids, times and values for the exogenous regressors. | +| features | List | | List of features to compute. Must take only df, freq, h, id_col and time_col (other arguments must be fixed). | +| freq | Union | | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | +| h | int | 0 | Forecast horizon. | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | +| **Returns** | **Tuple** | | **Original DataFrame with the computed features** | + + +```python +def is_weekend(times): + if isinstance(times, pd.Index): + dow = times.weekday + 1 # monday=0 in pandas and 1 in polars + else: + dow = times.dt.weekday() + return dow >= 6 + +def even_days_and_months(times): + if isinstance(times, pd.Index): + out = pd.DataFrame( + { + 'even_day': (times.weekday + 1) % 2 == 0, + 'even_month': times.month % 2 == 0, + } + ) + else: + # for polars you can return a list of expressions + out = [ + (times.dt.weekday() % 2 == 0).alias('even_day'), + (times.dt.month() % 2 == 0).alias('even_month'), + ] + return out + +features = [ + trend, + partial(fourier, season_length=7, k=1), + partial(fourier, season_length=28, k=1), + partial(time_features, features=['day', is_weekend, even_days_and_months]), +] +transformed_df, future_df = pipeline( + series, + features=features, + freq='D', + h=1, +) +transformed_df +``` + +| | unique_id | ds | y | trend | sin1_7 | cos1_7 | sin1_28 | cos1_28 | day | is_weekend | even_day | even_month | +|----|----|----|----|----|----|----|----|----|----|----|----|----| +| 0 | 0 | 2000-10-05 | 0.428973 | 152.0 | -0.974927 | -0.222526 | 0.433885 | -9.009683e-01 | 5 | False | True | True | +| 1 | 0 | 2000-10-06 | 1.423626 | 153.0 | -0.781835 | 0.623486 | 0.222522 | -9.749276e-01 | 6 | False | False | True | +| 2 | 0 | 2000-10-07 | 2.311782 | 154.0 | -0.000005 | 1.000000 | 0.000001 | -1.000000e+00 | 7 | True | True | True | +| 3 | 0 | 2000-10-08 | 3.192191 | 155.0 | 0.781829 | 0.623493 | -0.222520 | -9.749281e-01 | 8 | True | False | True | +| 4 | 0 | 2000-10-09 | 4.148767 | 156.0 | 0.974929 | -0.222517 | -0.433883 | -9.009693e-01 | 9 | False | False | True | +| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | +| 1096 | 4 | 2001-05-10 | 4.058910 | 369.0 | -0.974927 | -0.222523 | 0.900969 | 4.338843e-01 | 10 | False | True | False | +| 1097 | 4 | 2001-05-11 | 5.178157 | 370.0 | -0.781823 | 0.623500 | 0.974929 | 2.225177e-01 | 11 | False | False | False | +| 1098 | 4 | 2001-05-12 | 6.133142 | 371.0 | -0.000002 | 1.000000 | 1.000000 | 4.251100e-07 | 12 | True | True | False | +| 1099 | 4 | 2001-05-13 | 0.403709 | 372.0 | 0.781840 | 0.623479 | 0.974927 | -2.225243e-01 | 13 | True | False | False | +| 1100 | 4 | 2001-05-14 | 1.081779 | 373.0 | 0.974928 | -0.222520 | 0.900969 | -4.338835e-01 | 14 | False | False | False | + + +```python +future_df +``` + +| | unique_id | ds | trend | sin1_7 | cos1_7 | sin1_28 | cos1_28 | day | is_weekend | even_day | even_month | +|----|----|----|----|----|----|----|----|----|----|----|----| +| 0 | 0 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False | +| 1 | 1 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False | +| 2 | 2 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False | +| 3 | 3 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False | +| 4 | 4 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False | + diff --git a/utilsforecast/grouped_array.mdx b/utilsforecast/grouped_array.mdx new file mode 100644 index 00000000..1f9221b0 --- /dev/null +++ b/utilsforecast/grouped_array.mdx @@ -0,0 +1,193 @@ + +```python +# test _append_one +data = np.arange(5) +indptr = np.array([0, 2, 5]) +new = np.array([7, 8]) +new_data, new_indptr = _append_one(data, indptr, new) +np.testing.assert_equal( + new_data, + np.array([0, 1, 7, 2, 3, 4, 8]) +) +np.testing.assert_equal( + new_indptr, + np.array([0, 3, 7]), +) + +# 2d +data = np.arange(5).reshape(-1, 1) +new_data, new_indptr = _append_one(data, indptr, new) +np.testing.assert_equal( + new_data, + np.array([0, 1, 7, 2, 3, 4, 8]).reshape(-1, 1) +) +np.testing.assert_equal( + new_indptr, + np.array([0, 3, 7]), +) +``` + + +```python +# test append several +data = np.arange(5) +indptr = np.array([0, 2, 5]) +new_sizes = np.array([0, 2, 1]) +new_values = np.array([6, 7, 5]) +new_groups = np.array([False, True, False]) +new_data, new_indptr = _append_several(data, indptr, new_sizes, new_values, new_groups) +np.testing.assert_equal( + new_data, + np.array([0, 1, 6, 7, 2, 3, 4, 5]) +) +np.testing.assert_equal( + new_indptr, + np.array([0, 2, 4, 8]), +) + +# 2d +data = np.arange(5).reshape(-1, 1) +indptr = np.array([0, 2, 5]) +new_sizes = np.array([0, 2, 1]) +new_values = np.array([6, 7, 5]).reshape(-1, 1) +new_groups = np.array([False, True, False]) +new_data, new_indptr = _append_several(data, indptr, new_sizes, new_values, new_groups) +np.testing.assert_equal( + new_data, + np.array([0, 1, 6, 7, 2, 3, 4, 5]).reshape(-1, 1) +) +np.testing.assert_equal( + new_indptr, + np.array([0, 2, 4, 8]), +) +``` + +------------------------------------------------------------------------ + +source + +### GroupedArray + +> ``` text +> GroupedArray (data:numpy.ndarray, indptr:numpy.ndarray) +> ``` + +*Initialize self. See help(type(self)) for accurate signature.* + + +```python +from fastcore.test import test_eq, test_fail + +from utilsforecast.data import generate_series +``` + + +```python +# The `GroupedArray` is used internally for storing the series values and performing transformations. +data = np.arange(20, dtype=np.float32).reshape(-1, 2) +indptr = np.array([0, 2, 10]) # group 1: [0, 1], group 2: [2..9] +ga = GroupedArray(data, indptr) +test_eq(len(ga), 2) +``` + + +```python +# Iterate through the groups +ga_iter = iter(ga) +np.testing.assert_equal(next(ga_iter), np.arange(4).reshape(-1, 2)) +np.testing.assert_equal(next(ga_iter), np.arange(4, 20).reshape(-1, 2)) +``` + + +```python +# Take the last two observations from each group +last2_data, last2_indptr = ga.take_from_groups(slice(-2, None)) +np.testing.assert_equal( + last2_data, + np.vstack([ + np.arange(4).reshape(-1, 2), + np.arange(16, 20).reshape(-1, 2), + ]), +) +np.testing.assert_equal(last2_indptr, np.array([0, 2, 4])) + +# 1d +ga1d = GroupedArray(np.arange(10), indptr) +last2_data1d, last2_indptr1d = ga1d.take_from_groups(slice(-2, None)) +np.testing.assert_equal( + last2_data1d, + np.array([0, 1, 8, 9]) +) +np.testing.assert_equal(last2_indptr1d, np.array([0, 2, 4])) +``` + + +```python +# Take the second observation from each group +second_data, second_indptr = ga.take_from_groups(1) +np.testing.assert_equal(second_data, np.array([[2, 3], [6, 7]])) +np.testing.assert_equal(second_indptr, np.array([0, 1, 2])) + +# 1d +second_data1d, second_indptr1d = ga1d.take_from_groups(1) +np.testing.assert_equal(second_data1d, np.array([1, 3])) +np.testing.assert_equal(second_indptr1d, np.array([0, 1, 2])) +``` + + +```python +# Take the last four observations from every group. Note that since group 1 only has two elements, only these are returned. +last4_data, last4_indptr = ga.take_from_groups(slice(-4, None)) +np.testing.assert_equal( + last4_data, + np.vstack([ + np.arange(4).reshape(-1, 2), + np.arange(12, 20).reshape(-1, 2), + ]), +) +np.testing.assert_equal(last4_indptr, np.array([0, 2, 6])) + +# 1d +last4_data1d, last4_indptr1d = ga1d.take_from_groups(slice(-4, None)) +np.testing.assert_equal( + last4_data1d, + np.array([0, 1, 6, 7, 8, 9]) +) +np.testing.assert_equal(last4_indptr1d, np.array([0, 2, 6])) +``` + + +```python +# Select a specific subset of groups +indptr = np.array([0, 2, 4, 7, 10]) +ga2 = GroupedArray(data, indptr) +subset = GroupedArray(*ga2.take([0, 2])) +np.testing.assert_allclose(subset[0].data, ga2[0].data) +np.testing.assert_allclose(subset[1].data, ga2[2].data) + +# 1d +ga2_1d = GroupedArray(np.arange(10), indptr) +subset1d = GroupedArray(*ga2_1d.take([0, 2])) +np.testing.assert_allclose(subset1d[0].data, ga2_1d[0].data) +np.testing.assert_allclose(subset1d[1].data, ga2_1d[2].data) +``` + + +```python +# try to append new values that don't match the number of groups +test_fail(lambda: ga.append(np.array([1., 2., 3.])), contains='new must have 2 rows') +``` + + +```python +# build from df +series_pd = generate_series(10, static_as_categorical=False, engine='pandas') +ga_pd = GroupedArray.from_sorted_df(series_pd, 'unique_id', 'ds', 'y') +series_pl = generate_series(10, static_as_categorical=False, engine='polars') +ga_pl = GroupedArray.from_sorted_df(series_pl, 'unique_id', 'ds', 'y') +np.testing.assert_allclose(ga_pd.data, ga_pl.data) +np.testing.assert_equal(ga_pd.indptr, ga_pl.indptr) +``` + diff --git a/utilsforecast/imgs/losses/mae_loss.png b/utilsforecast/imgs/losses/mae_loss.png new file mode 100644 index 00000000..c9d3b7fa Binary files /dev/null and b/utilsforecast/imgs/losses/mae_loss.png differ diff --git a/utilsforecast/imgs/losses/mape_loss.png b/utilsforecast/imgs/losses/mape_loss.png new file mode 100644 index 00000000..d0f9a66a Binary files /dev/null and b/utilsforecast/imgs/losses/mape_loss.png differ diff --git a/utilsforecast/imgs/losses/mase_loss.png b/utilsforecast/imgs/losses/mase_loss.png new file mode 100644 index 00000000..90db8c90 Binary files /dev/null and b/utilsforecast/imgs/losses/mase_loss.png differ diff --git a/utilsforecast/imgs/losses/mq_loss.png b/utilsforecast/imgs/losses/mq_loss.png new file mode 100644 index 00000000..7e3f6da3 Binary files /dev/null and b/utilsforecast/imgs/losses/mq_loss.png differ diff --git a/utilsforecast/imgs/losses/mse_loss.png b/utilsforecast/imgs/losses/mse_loss.png new file mode 100644 index 00000000..d175d5e0 Binary files /dev/null and b/utilsforecast/imgs/losses/mse_loss.png differ diff --git a/utilsforecast/imgs/losses/q_loss.png b/utilsforecast/imgs/losses/q_loss.png new file mode 100644 index 00000000..942dbc30 Binary files /dev/null and b/utilsforecast/imgs/losses/q_loss.png differ diff --git a/utilsforecast/imgs/losses/rmae_loss.png b/utilsforecast/imgs/losses/rmae_loss.png new file mode 100644 index 00000000..39a05b2e Binary files /dev/null and b/utilsforecast/imgs/losses/rmae_loss.png differ diff --git a/utilsforecast/imgs/losses/rmse_loss.png b/utilsforecast/imgs/losses/rmse_loss.png new file mode 100644 index 00000000..0ceadef0 Binary files /dev/null and b/utilsforecast/imgs/losses/rmse_loss.png differ diff --git a/utilsforecast/imgs/plotting.png b/utilsforecast/imgs/plotting.png new file mode 100644 index 00000000..549fad86 Binary files /dev/null and b/utilsforecast/imgs/plotting.png differ diff --git a/utilsforecast/index.html.mdx b/utilsforecast/index.html.mdx new file mode 100644 index 00000000..09be4f2e --- /dev/null +++ b/utilsforecast/index.html.mdx @@ -0,0 +1,139 @@ +--- +description: Forecasting utilities +output-file: index.html +title: utilsforecast +--- + + +## Install + +### PyPI + + +```sh +pip install utilsforecast +``` + +### Conda + + +```sh +conda install -c conda-forge utilsforecast +``` + +## How to use + +### Generate synthetic data + + +```python +from utilsforecast.data import generate_series +``` + + +```python +series = generate_series(3, with_trend=True, static_as_categorical=False) +series +``` + +| | unique_id | ds | y | +|-----|-----------|------------|------------| +| 0 | 0 | 2000-01-01 | 0.422133 | +| 1 | 0 | 2000-01-02 | 1.501407 | +| 2 | 0 | 2000-01-03 | 2.568495 | +| 3 | 0 | 2000-01-04 | 3.529085 | +| 4 | 0 | 2000-01-05 | 4.481929 | +| ... | ... | ... | ... | +| 481 | 2 | 2000-06-11 | 163.914625 | +| 482 | 2 | 2000-06-12 | 166.018479 | +| 483 | 2 | 2000-06-13 | 160.839176 | +| 484 | 2 | 2000-06-14 | 162.679603 | +| 485 | 2 | 2000-06-15 | 165.089288 | + +### Plotting + + +```python +from utilsforecast.plotting import plot_series +``` + + +```python +fig = plot_series(series, plot_random=False, max_insample_length=50, engine='matplotlib') +fig.savefig('imgs/index.png', bbox_inches='tight') +``` + + + +### Preprocessing + + +```python +from utilsforecast.preprocessing import fill_gaps +``` + + +```python +serie = series[series['unique_id'].eq(0)].tail(10) +# drop some points +with_gaps = serie.sample(frac=0.5, random_state=0).sort_values('ds') +with_gaps +``` + +| | unique_id | ds | y | +|-----|-----------|------------|-----------| +| 213 | 0 | 2000-08-01 | 18.543147 | +| 214 | 0 | 2000-08-02 | 19.941764 | +| 216 | 0 | 2000-08-04 | 21.968733 | +| 220 | 0 | 2000-08-08 | 19.091509 | +| 221 | 0 | 2000-08-09 | 20.220739 | + + +```python +fill_gaps(with_gaps, freq='D') +``` + +| | unique_id | ds | y | +|-----|-----------|------------|-----------| +| 0 | 0 | 2000-08-01 | 18.543147 | +| 1 | 0 | 2000-08-02 | 19.941764 | +| 2 | 0 | 2000-08-03 | NaN | +| 3 | 0 | 2000-08-04 | 21.968733 | +| 4 | 0 | 2000-08-05 | NaN | +| 5 | 0 | 2000-08-06 | NaN | +| 6 | 0 | 2000-08-07 | NaN | +| 7 | 0 | 2000-08-08 | 19.091509 | +| 8 | 0 | 2000-08-09 | 20.220739 | + +### Evaluating + + +```python +from functools import partial + +import numpy as np + +from utilsforecast.evaluation import evaluate +from utilsforecast.losses import mape, mase +``` + + +```python +valid = series.groupby('unique_id').tail(7).copy() +train = series.drop(valid.index) +rng = np.random.RandomState(0) +valid['seas_naive'] = train.groupby('unique_id')['y'].tail(7).values +valid['rand_model'] = valid['y'] * rng.rand(valid['y'].shape[0]) +daily_mase = partial(mase, seasonality=7) +evaluate(valid, metrics=[mape, daily_mase], train_df=train) +``` + +| | unique_id | metric | seas_naive | rand_model | +|-----|-----------|--------|------------|------------| +| 0 | 0 | mape | 0.024139 | 0.440173 | +| 1 | 1 | mape | 0.054259 | 0.278123 | +| 2 | 2 | mape | 0.042642 | 0.480316 | +| 3 | 0 | mase | 0.907149 | 16.418014 | +| 4 | 1 | mase | 0.991635 | 6.404254 | +| 5 | 2 | mase | 1.013596 | 11.365040 | + diff --git a/utilsforecast/light.png b/utilsforecast/light.png new file mode 100644 index 00000000..bbb99b54 Binary files /dev/null and b/utilsforecast/light.png differ diff --git a/utilsforecast/losses.html.mdx b/utilsforecast/losses.html.mdx new file mode 100644 index 00000000..7a201c3a --- /dev/null +++ b/utilsforecast/losses.html.mdx @@ -0,0 +1,1232 @@ +--- +description: Loss functions for model evaluation. +output-file: losses.html +title: Losses +--- + + +The most important train signal is the forecast error, which is the +difference between the observed value $y_{\tau}$ and the prediction +$\hat{y}_{\tau}$, at time $y_{\tau}$: + +$$ + +e_{\tau} = y_{\tau}-\hat{y}_{\tau} \qquad \qquad \tau \in \{t+1,\dots,t+H \} + +$$ + +The train loss summarizes the forecast errors in different evaluation +metrics. + + +```python +from utilsforecast.data import generate_series +``` + + +```python +from polars.testing import assert_frame_equal as pl_assert_frame_equal +models = ['model0', 'model1'] +series = generate_series(1000, n_models=2, level=[80]) +series_pl = generate_series(1000, n_models=2, level=[80], engine='polars') +``` + +## 1. Scale-dependent Errors + +### Mean Absolute Error (MAE) + +$$ + +\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} |y_{\tau} - \hat{y}_{\tau}| + +$$ + + + +------------------------------------------------------------------------ + +source + +#### mae + +> ``` text +> mae (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Mean Absolute Error (MAE) + +MAE measures the relative prediction accuracy of a forecasting method by +calculating the deviation of the prediction and the true value at a +given time and averages these devations over the length of the series.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +def pd_vs_pl(pd_df, pl_df, models): + pd.testing.assert_frame_equal(pd_df[models], + pl_df[models].to_pandas()) +``` + + +```python +pd_vs_pl( + mae(series, models), + mae(series_pl, models), + models=models +) +``` + +### Mean Squared Error + +$$ + +\mathrm{MSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^{2} + +$$ + + + +------------------------------------------------------------------------ + +source + +#### mse + +> ``` text +> mse (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Mean Squared Error (MSE) + +MSE measures the relative prediction accuracy of a forecasting method by +calculating the squared deviation of the prediction and the true value +at a given time, and averages these devations over the length of the +series.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + mse(series, models), + mse(series_pl, models), + models, +) +``` + +### Root Mean Squared Error + +$$ + +\mathrm{RMSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \sqrt{\frac{1}{H} \sum^{t+H}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^{2}} + +$$ + + + +------------------------------------------------------------------------ + +source + +#### rmse + +> ``` text +> rmse (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Root Mean Squared Error (RMSE) + +RMSE measures the relative prediction accuracy of a forecasting method +by calculating the squared deviation of the prediction and the observed +value at a given time and averages these devations over the length of +the series. Finally the RMSE will be in the same scale as the original +time series so its comparison with other series is possible only if they +share a common scale. RMSE has a direct connection to the L2 norm.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + rmse(series, models), + rmse(series_pl, models), + models, +) +``` + +------------------------------------------------------------------------ + +source + +#### bias + +> ``` text +> bias (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Forecast estimator bias. + +Defined as prediction - actual\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + bias(series, models), + bias(series_pl, models), + models, +) +``` + +------------------------------------------------------------------------ + +source + +#### cfe + +> ``` text +> cfe (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Cumulative Forecast Error (CFE) + +Total signed forecast error per series. Positive values mean under +forecast; negative mean over forecast.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + cfe(series, models), + cfe(series_pl, models), + models, +) +``` + + +```python +# case for cfe +df = pd.DataFrame({ + "unique_id": ["X","X","Y","Y"], + "y": [5, 10, 3, 7], + "y_hat": [7, 7, 1, 10] +}) +# errors: +# X: (7 - 5) + (7 - 10) = 2 + (-3) = -1 +# Y: (1 - 3) + (10 - 7) = -2 + 3 = 1 +expected = pd.DataFrame({ + "unique_id": ["X", "Y"], + "y_hat": [-1, 1] + }) + +# pandas +out_pd = cfe(df, ["y_hat"]) +pd.testing.assert_frame_equal( + out_pd, + expected + ) +``` + + +```python +df_pl = pl.from_pandas(df) +out_pl = cfe(df_pl, ["y_hat"]) +pl_assert_frame_equal( + out_pl, + pl.from_pandas(expected) +) +``` + +------------------------------------------------------------------------ + +source + +#### pis + +> ``` text +> pis (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Compute the raw Absolute Periods In Stock (PIS) for one or multiple +models. + +The PIS metric sums the absolute forecast errors per series without any +scaling, yielding a scale-dependent measure of bias.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + pis(series, models), + pis(series_pl, models), + models, +) +``` + + +```python +# case for pis +df = pd.DataFrame({ + "unique_id": ["A","A","B","B"], + "y": [10, 15, 5, 7], + "y_hat": [12, 14, 4, 10] +}) +# errors: +# A: |12−10| + |14−15| = 2 + 1 = 3 +# B: |4−5| + |10−7| = 1 + 3 = 4 +expected = pd.DataFrame({ + "unique_id": ["A", "B"], + "y_hat": [3, 4] + }) + +# pandas branch +out_pd = pis(df, ["y_hat"]) +pd.testing.assert_frame_equal( + out_pd, + expected +) +``` + + +```python +df_pl = pl.from_pandas(df) +out_pl = pis(df_pl, ["y_hat"]) +pl_assert_frame_equal( + out_pl, + pl.from_pandas(expected) +) +``` + +------------------------------------------------------------------------ + +source + +#### spis + +> ``` text +> spis (df:~DFType, df_train:~DFType, models:List[str], +> id_col:str='unique_id', target_col:str='y') +> ``` + +\*Compute the scaled Absolute Periods In Stock (sAPIS) for one or +multiple models. + +The sPIS metric scales the sum of absolute forecast errors by the mean +in-sample demand, yielding a scale-independent bias measure that can be +aggregated across series.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| df_train | DFType | | | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + spis(series, series, models), + spis(series_pl, series_pl, models), + models, +) +``` + +``` text +/tmp/ipykernel_12558/2968355870.py:19: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. + .groupby(id_col)[target_col] +``` + + +```python +# case for scaled pis +df_train = pd.DataFrame({ + "unique_id": ["A","A","B","B"], + "y": [1, 3, 2, 6] +}) +# Forecast data +df = pd.DataFrame({ + "unique_id": ["A","A","B","B"], + "y": [3, 3, 2, 8], + "y_hat": [6, 2, 3, 5] +}) +# For A: errors = |3−6|+|3−2| = 3+1 = 4 ÷ mean(1,3)=2 → 2 +# For B: errors = |2−3|+|8−5| = 1+3 = 4 ÷ mean(2,6)=4 → 1 +expected = pd.DataFrame({ + "unique_id": ["A", "B"], + "y_hat": [2.0, 1.0] + }) + +# pandas branch +out_pd = spis( + df = df, + df_train = df_train, + models = ["y_hat"], + id_col = "unique_id", + target_col = "y", +) +pd.testing.assert_frame_equal( + out_pd, + expected +) +``` + + +```python +df_train_pl = pl.from_pandas(df_train) +df_pl = pl.from_pandas(df) +out_pl = spis( + df = df_pl, + df_train = df_train_pl, + models = ["y_hat"], + id_col = "unique_id", + target_col = "y", +) +pl.testing.assert_frame_equal( + out_pl, + pl.from_pandas(expected) +) +``` + +## 2. Percentage Errors + +### Mean Absolute Percentage Error + +$$ + +\mathrm{MAPE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{|y_{\tau}|} + +$$ + + + +------------------------------------------------------------------------ + +source + +#### mape + +> ``` text +> mape (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Mean Absolute Percentage Error (MAPE) + +MAPE measures the relative prediction accuracy of a forecasting method +by calculating the percentual deviation of the prediction and the +observed value at a given time and averages these devations over the +length of the series. The closer to zero an observed value is, the +higher penalty MAPE loss assigns to the corresponding error.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + mape(series, models), + mape(series_pl, models), + models, +) +``` + +### Symmetric Mean Absolute Percentage Error + +$$ + +\mathrm{SMAPE}_{2}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{|y_{\tau}|+|\hat{y}_{\tau}|} + +$$ + +------------------------------------------------------------------------ + +source + +#### smape + +> ``` text +> smape (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Symmetric Mean Absolute Percentage Error (SMAPE) + +SMAPE measures the relative prediction accuracy of a forecasting method +by calculating the relative deviation of the prediction and the observed +value scaled by the sum of the absolute values for the prediction and +observed value at a given time, then averages these devations over the +length of the series. This allows the SMAPE to have bounds between 0% +and 100% which is desirable compared to normal MAPE that may be +undetermined when the target is zero.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actual values and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + smape(series, models), + smape(series_pl, models), + models, +) +``` + +## 3. Scale-independent Errors + +### Mean Absolute Scaled Error + +$$ + +\mathrm{MASE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau}) = +\frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau})} + +$$ + + + +------------------------------------------------------------------------ + +source + +#### mase + +> ``` text +> mase (df:~DFType, models:List[str], seasonality:int, train_df:~DFType, +> id_col:str='unique_id', target_col:str='y') +> ``` + +\*Mean Absolute Scaled Error (MASE) + +MASE measures the relative prediction accuracy of a forecasting method +by comparinng the mean absolute errors of the prediction and the +observed value against the mean absolute errors of the seasonal naive +model. The MASE partially composed the Overall Weighted Average (OWA), +used in the M4 Competition.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actuals and predictions. | +| models | List | | Columns that identify the models predictions. | +| seasonality | int | | Main frequency of the time series;
Hourly 24, Daily 7, Weekly 52, Monthly 12, Quarterly 4, Yearly 1. | +| train_df | DFType | | Training dataframe with id and actual values. Must be sorted by time. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + mase(series, models, 7, series), + mase(series_pl, models, 7, series_pl), + models, +) +``` + +### Relative Mean Absolute Error + +$$ + +\mathrm{RMAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}, \mathbf{\hat{y}}^{base}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{base}_{\tau})} + +$$ + + + +------------------------------------------------------------------------ + +source + +#### rmae + +> ``` text +> rmae (df:~DFType, models:List[str], baseline:str, id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Relative Mean Absolute Error (RMAE) + +Calculates the RAME between two sets of forecasts (from two different +forecasting methods). A number smaller than one implies that the +forecast in the numerator is better than the forecast in the +denominator.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | List | | Columns that identify the models predictions. | +| baseline | str | | Column that identifies the baseline model predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + rmae(series, models, models[0]), + rmae(series_pl, models, models[0]), + models, +) +``` + +### Normalized Deviation + +$$ + +\mathrm{ND}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{\sum^{t+H}_{\tau=t+1} |y_{\tau} - \hat{y}_{\tau}|}{\sum^{t+H}_{\tau=t+1} | y_{\tau} |} + +$$ + +------------------------------------------------------------------------ + +source + +#### nd + +> ``` text +> nd (df:~DFType, models:List[str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Normalized Deviation (ND) + +ND measures the relative prediction accuracy of a forecasting method by +calculating the sum of the absolute deviation of the prediction and the +true value at a given time and dividing it by the sum of the absolute +value of the ground truth.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | List | | Columns that identify the models predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + nd(series, models), + nd(series_pl, models), + models, +) +``` + +### Mean Squared Scaled Error + +$$ + +\mathrm{MSSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau}) = +\frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{(y_{\tau}-\hat{y}_{\tau})^2}{\mathrm{MSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau})} + +$$ + +------------------------------------------------------------------------ + +source + +#### msse + +> ``` text +> msse (df:~DFType, models:List[str], seasonality:int, train_df:~DFType, +> id_col:str='unique_id', target_col:str='y') +> ``` + +\*Mean Squared Scaled Error (MSSE) + +MSSE measures the relative prediction accuracy of a forecasting method +by comparinng the mean squared errors of the prediction and the observed +value against the mean squared errors of the seasonal naive model.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actuals and predictions. | +| models | List | | Columns that identify the models predictions. | +| seasonality | int | | Main frequency of the time series;
Hourly 24, Daily 7, Weekly 52, Monthly 12, Quarterly 4, Yearly 1. | +| train_df | DFType | | Training dataframe with id and actual values. Must be sorted by time. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + msse(series, models, 7, series), + msse(series_pl, models, 7, series_pl), + models, +) +``` + +### Root Mean Squared Scaled Error + +$$ + +\mathrm{RMSSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau}) = +\sqrt{\frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{(y_{\tau}-\hat{y}_{\tau})^2}{\mathrm{MSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau})}} + +$$ + +------------------------------------------------------------------------ + +source + +#### rmsse + +> ``` text +> rmsse (df:~DFType, models:List[str], seasonality:int, train_df:~DFType, +> id_col:str='unique_id', target_col:str='y') +> ``` + +\*Root Mean Squared Scaled Error (RMSSE) + +MSSE measures the relative prediction accuracy of a forecasting method +by comparinng the mean squared errors of the prediction and the observed +value against the mean squared errors of the seasonal naive model.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actuals and predictions. | +| models | List | | Columns that identify the models predictions. | +| seasonality | int | | Main frequency of the time series;
Hourly 24, Daily 7, Weekly 52, Monthly 12, Quarterly 4, Yearly 1. | +| train_df | DFType | | Training dataframe with id and actual values. Must be sorted by time. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + rmsse(series, models, 7, series), + rmsse(series_pl, models, 7, series_pl), + models, +) +``` + +## 4. Probabilistic Errors + +### Quantile Loss + +$$ + +\mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q)}_{\tau}) = +\frac{1}{H} \sum^{t+H}_{\tau=t+1} +\Big( (1-q)\,( \hat{y}^{(q)}_{\tau} - y_{\tau} )_{+} ++ q\,( y_{\tau} - \hat{y}^{(q)}_{\tau} )_{+} \Big) + +$$ + + + +------------------------------------------------------------------------ + +source + +#### quantile_loss + +> ``` text +> quantile_loss (df:~DFType, models:Dict[str,str], q:float=0.5, +> id_col:str='unique_id', target_col:str='y') +> ``` + +\*Quantile Loss (QL) + +QL measures the deviation of a quantile forecast. By weighting the +absolute deviation in a non symmetric way, the loss pays more attention +to under or over estimation. +A common value for q is 0.5 for the deviation from the median.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | Dict | | Mapping from model name to the model predictions for the specified quantile. | +| q | float | 0.5 | Quantile for the predictions’ comparison. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + +### Scaled Quantile Loss + +$$ + +\mathrm{SQL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q)}_{\tau}) = +\frac{1}{H} \sum^{t+H}_{\tau=t+1} +\frac{(1-q)\,( \hat{y}^{(q)}_{\tau} - y_{\tau} )_{+} ++ q\,( y_{\tau} - \hat{y}^{(q)}_{\tau} )_{+}}{\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau})} + +$$ + +------------------------------------------------------------------------ + +source + +### scaled_quantile_loss + +> ``` text +> scaled_quantile_loss (df:~DFType, models:Dict[str,str], seasonality:int, +> train_df:~DFType, q:float=0.5, +> id_col:str='unique_id', target_col:str='y') +> ``` + +\*Scaled Quantile Loss (SQL) + +SQL measures the deviation of a quantile forecast scaled by the mean +absolute errors of the seasonal naive model. By weighting the absolute +deviation in a non symmetric way, the loss pays more attention to under +or over estimation. A common value for q is 0.5 for the deviation from +the median. This was the official measure used in the M5 Uncertainty +competition with seasonality = 1.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | Dict | | Mapping from model name to the model predictions for the specified quantile. | +| seasonality | int | | Main frequency of the time series;
Hourly 24, Daily 7, Weekly 52, Monthly 12, Quarterly 4, Yearly 1. | +| train_df | DFType | | Training dataframe with id and actual values. Must be sorted by time. | +| q | float | 0.5 | Quantile for the predictions’ comparison. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + +### Multi-Quantile Loss + +$$ + +\mathrm{MQL}(\mathbf{y}_{\tau}, +[\mathbf{\hat{y}}^{(q_{1})}_{\tau}, ... ,\hat{y}^{(q_{n})}_{\tau}]) = +\frac{1}{n} \sum_{q_{i}} \mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q_{i})}_{\tau}) + +$$ + + + +------------------------------------------------------------------------ + +source + +#### mqloss + +> ``` text +> mqloss (df:~DFType, models:Dict[str,List[str]], quantiles:numpy.ndarray, +> id_col:str='unique_id', target_col:str='y') +> ``` + +\*Multi-Quantile loss (MQL) + +MQL calculates the average multi-quantile Loss for a given set of +quantiles, based on the absolute difference between predicted quantiles +and observed values. + +The limit behavior of MQL allows to measure the accuracy of a full +predictive distribution with the continuous ranked probability score +(CRPS). This can be achieved through a numerical integration technique, +that discretizes the quantiles and treats the CRPS integral with a left +Riemann approximation, averaging over uniformly distanced quantiles.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | Dict | | Mapping from model name to the model predictions for each quantile. | +| quantiles | ndarray | | Quantiles to compare against. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + mqloss(series, mq_models, quantiles=quantiles), + mqloss(series_pl, mq_models, quantiles=quantiles), + models, +) +``` + +### Scaled Multi-Quantile Loss + +$$ + +\mathrm{MQL}(\mathbf{y}_{\tau}, +[\mathbf{\hat{y}}^{(q_{1})}_{\tau}, ... ,\hat{y}^{(q_{n})}_{\tau}]) = +\frac{1}{n} \sum_{q_{i}} \frac{\mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q_{i})}_{\tau})}{\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau})} + +$$ + +------------------------------------------------------------------------ + +source + +### scaled_mqloss + +> ``` text +> scaled_mqloss (df:~DFType, models:Dict[str,List[str]], +> quantiles:numpy.ndarray, seasonality:int, +> train_df:~DFType, id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Scaled Multi-Quantile loss (SMQL) + +SMQL calculates the average multi-quantile Loss for a given set of +quantiles, based on the absolute difference between predicted quantiles +and observed values scaled by the mean absolute errors of the seasonal +naive model. The limit behavior of MQL allows to measure the accuracy of +a full predictive distribution with the continuous ranked probability +score (CRPS). This can be achieved through a numerical integration +technique, that discretizes the quantiles and treats the CRPS integral +with a left Riemann approximation, averaging over uniformly distanced +quantiles. This was the official measure used in the M5 Uncertainty +competition with seasonality = 1.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | Dict | | Mapping from model name to the model predictions for each quantile. | +| quantiles | ndarray | | Quantiles to compare against. | +| seasonality | int | | Main frequency of the time series;
Hourly 24, Daily 7, Weekly 52, Monthly 12, Quarterly 4, Yearly 1. | +| train_df | DFType | | Training dataframe with id and actual values. Must be sorted by time. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + scaled_mqloss(series, mq_models, quantiles=quantiles, seasonality=1, train_df=series), + scaled_mqloss(series_pl, mq_models, quantiles=quantiles, seasonality=1, train_df=series_pl), + models, +) +``` + +### Coverage + +------------------------------------------------------------------------ + +source + +#### coverage + +> ``` text +> coverage (df:~DFType, models:List[str], level:int, +> id_col:str='unique_id', target_col:str='y') +> ``` + +*Coverage of y with y_hat_lo and y_hat_hi.* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | List | | Columns that identify the models predictions. | +| level | int | | Confidence level used for intervals. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + coverage(series, models, 80), + coverage(series_pl, models, 80), + models, +) +``` + +### Calibration + +------------------------------------------------------------------------ + +source + +#### calibration + +> ``` text +> calibration (df:~DFType, models:Dict[str,str], id_col:str='unique_id', +> target_col:str='y') +> ``` + +*Fraction of y that is lower than the model’s predictions.* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | Dict | | Mapping from model name to the model predictions. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + calibration(series, q_models[0.1]), + calibration(series_pl, q_models[0.1]), + models, +) +``` + +### CRPS + +$$ + +\mathrm{sCRPS}(\hat{F}_{\tau}, \mathbf{y}_{\tau}) = \frac{2}{N} \sum_{i} +\int^{1}_{0} \frac{\mathrm{QL}(\hat{F}_{i,\tau}, y_{i,\tau})_{q}}{\sum_{i} | y_{i,\tau} |} dq + +$$ + +Where $\hat{F}_{\tau}$ is the an estimated multivariate distribution, +and $y_{i,\tau}$ are its realizations. + +------------------------------------------------------------------------ + +source + +#### scaled_crps + +> ``` text +> scaled_crps (df:~DFType, models:Dict[str,List[str]], +> quantiles:numpy.ndarray, id_col:str='unique_id', +> target_col:str='y') +> ``` + +\*Scaled Continues Ranked Probability Score + +Calculates a scaled variation of the CRPS, as proposed by Rangapuram +(2021), to measure the accuracy of predicted quantiles `y_hat` compared +to the observation `y`. This metric averages percentual weighted +absolute deviations as defined by the quantile losses.\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, times, actuals and predictions. | +| models | Dict | | Mapping from model name to the model predictions for each quantile. | +| quantiles | ndarray | | Quantiles to compare against. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **dataframe with one row per id and one column per model.** | + + +```python +pd_vs_pl( + scaled_crps(series, mq_models, quantiles), + scaled_crps(series_pl, mq_models, quantiles), + models, +) +``` + +### Tweedie Deviance + +For a set of forecasts $\{\mu_i\}_{i=1}^N$ and observations +$\{y_i\}_{i=1}^N$, the mean Tweedie deviance with power $p$ is + +$$ + +\mathrm{TD}_{p}(\boldsymbol{\mu}, \mathbf{y}) += \frac{1}{N} \sum_{i=1}^{N} d_{p}(y_i, \mu_i) + +$$ + +where the unit-scaled deviance for each pair $(y,\mu)$ is + +$$ + +d_{p}(y,\mu) += +2 +\begin{cases} +\displaystyle +\frac{y^{2-p}}{(1-p)(2-p)} +\;-\; +\frac{y\,\mu^{1-p}}{1-p} +\;+\; +\frac{\mu^{2-p}}{2-p}, +& p \notin\{1,2\},\\[1em] +\displaystyle +y\,\ln\!\frac{y}{\mu}\;-\;(y-\mu), +& p = 1\quad(\text{Poisson deviance}),\\[0.5em] +\displaystyle +-2\Bigl[\ln\!\frac{y}{\mu}\;-\;\frac{y-\mu}{\mu}\Bigr], +& p = 2\quad(\text{Gamma deviance}). +\end{cases} + +$$ + +- $y_i$ are the true values, $\mu_i$ the predicted means. +- $p$ controls the variance relationship + $\mathrm{Var}(Y)\propto\mu^{p}$. +- When $1source + +#### tweedie_deviance + +> ``` text +> tweedie_deviance (df:~DFType, models:List[str], power:float=1.5, +> id_col:str='unique_id', target_col:str='y') +> ``` + +\*Compute the Tweedie deviance loss for one or multiple models, grouped +by an identifier. + +Each group’s deviance is calculated using the mean_tweedie_deviance +function, which measures the deviation between actual and predicted +values under the Tweedie distribution. + +The `power` parameter defines the specific compound distribution: - 1: +Poisson - (1, 2): Compound Poisson-Gamma - 2: Gamma - \>2: Inverse +Gaussian\* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input dataframe with id, actuals and predictions. | +| models | List | | Columns that identify the models predictions. | +| power | float | 1.5 | Tweedie power parameter. Determines the compound distribution. | +| id_col | str | unique_id | Column that identifies each serie. | +| target_col | str | y | Column that contains the target. | +| **Returns** | **DFType** | | **DataFrame with one row per id and one column per model, containing the mean Tweedie deviance. ** | + + +```python +# Normal test +for power in [0, 1, 1.5, 2, 3]: + # Test Pandas vs Polars + td_pd = tweedie_deviance(series, models, target_col="y", power=power) + td_pl = tweedie_deviance(series_pl, models, target_col="y", power=power) + pd_vs_pl( + td_pd, + td_pl, + models, + ) + # Test for NaNs + assert not td_pd[models].isna().any().any(), f"NaNs found in pd DataFrame for power {power}" + assert not td_pl.select(pl.col(models).is_null().any()).sum_horizontal().item(), f"NaNs found in pl DataFrame for power {power}" + # Test for infinites + is_infinite = td_pd[models].isin([np.inf, -np.inf]).any().any() + assert not is_infinite, f"Infinities found in pd DataFrame for power {power}" + is_infinite_pl = td_pl.select(pl.col(models).is_infinite().any()).sum_horizontal().item() + assert not is_infinite_pl, f"Infinities found in pl DataFrame for power {power}" + +# Test zero handling (skip power >=2 since it requires all y > 0) +series.loc[0, 'y'] = 0.0 # Set a zero value to test the zero handling +series.loc[49, 'y'] = 0.0 # Set another zero value to test the zero handling +series_pl[0, 'y'] = 0.0 # Set a zero value to test the zero handling +series_pl[49, 'y'] = 0.0 # Set another zero value to test the zero handling +for power in [0, 1, 1.5]: + # Test Pandas vs Polars + td_pd = tweedie_deviance(series, models, target_col="y", power=power) + td_pl = tweedie_deviance(series_pl, models, target_col="y", power=power) + pd_vs_pl( + td_pd, + td_pl, + models, + ) + # Test for NaNs + assert not td_pd[models].isna().any().any(), f"NaNs found in pd DataFrame for power {power}" + assert not td_pl.select(pl.col(models).is_null().any()).sum_horizontal().item(), f"NaNs found in pl DataFrame for power {power}" + # Test for infinites + is_infinite = td_pd[models].isin([np.inf, -np.inf]).any().any() + assert not is_infinite, f"Infinities found in pd DataFrame for power {power}" + is_infinite_pl = td_pl.select(pl.col(models).is_infinite().any()).sum_horizontal().item() + assert not is_infinite_pl, f"Infinities found in pl DataFrame for power {power}" +``` + diff --git a/utilsforecast/mint.json b/utilsforecast/mint.json new file mode 100644 index 00000000..6ef163e2 --- /dev/null +++ b/utilsforecast/mint.json @@ -0,0 +1,39 @@ +{ + "$schema": "https://mintlify.com/schema.json", + "name": "Nixtla", + "logo": { + "light": "/light.png", + "dark": "/dark.png" + }, + "favicon": "/favicon.svg", + "colors": { + "primary": "#0E0E0E", + "light": "#FAFAFA", + "dark": "#0E0E0E", + "anchors": { + "from": "#2AD0CA", + "to": "#0E00F8" + } + }, + "topbarCtaButton": { + "type": "github", + "url": "https://github.com/Nixtla/utilsforecast" + }, + "navigation": [ + { + "group": "", + "pages": ["index.html"] + }, + { + "group": "API Reference", + "pages": [ + "preprocessing.html", + "feature_engineering.html", + "evaluation.html", + "losses.html", + "plotting.html", + "data.html" + ] + } + ] +} diff --git a/utilsforecast/plotting.html.mdx b/utilsforecast/plotting.html.mdx new file mode 100644 index 00000000..59874d81 --- /dev/null +++ b/utilsforecast/plotting.html.mdx @@ -0,0 +1,84 @@ +--- +description: Time series visualizations +output-file: plotting.html +title: Plotting +--- + + +------------------------------------------------------------------------ + +source + +### plot_series + +> ``` text +> plot_series (df:Optional[~DFType]=None, +> forecasts_df:Optional[~DFType]=None, +> ids:Optional[List[str]]=None, plot_random:bool=True, +> max_ids:int=8, models:Optional[List[str]]=None, +> level:Optional[List[float]]=None, +> max_insample_length:Optional[int]=None, +> plot_anomalies:bool=False, engine:str='matplotlib', +> palette:Optional[str]=None, id_col:str='unique_id', +> time_col:str='ds', target_col:str='y', seed:int=0, +> resampler_kwargs:Optional[Dict]=None, ax:Union[matplotlib.ax +> es._axes.Axes,numpy.ndarray,ForwardRef('plotly.graph_objects +> .Figure'),NoneType]=None) +> ``` + +*Plot forecasts and insample values.* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | Optional | None | DataFrame with columns \[`id_col`, `time_col`, `target_col`\]. | +| forecasts_df | Optional | None | DataFrame with columns \[`id_col`, `time_col`\] and models. | +| ids | Optional | None | Time Series to plot.
If None, time series are selected randomly. | +| plot_random | bool | True | Select time series to plot randomly. | +| max_ids | int | 8 | Maximum number of ids to plot. | +| models | Optional | None | Models to plot. | +| level | Optional | None | Prediction intervals to plot. | +| max_insample_length | Optional | None | Maximum number of train/insample observations to be plotted. | +| plot_anomalies | bool | False | Plot anomalies for each prediction interval. | +| engine | str | matplotlib | Library used to plot. ‘plotly’, ‘plotly-resampler’ or ‘matplotlib’. | +| palette | Optional | None | Name of the matplotlib colormap to use for the plots. If None, uses the current style. | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | +| target_col | str | y | Column that contains the target. | +| seed | int | 0 | Seed used for the random number generator. Only used if plot_random is True. | +| resampler_kwargs | Optional | None | Keyword arguments to be passed to plotly-resampler constructor.
For further custumization (“show_dash”) call the method,
store the plotting object and add the extra arguments to
its `show_dash` method. | +| ax | Union | None | Object where plots will be added. | +| **Returns** | **matplotlib or plotly figure** | | **Plot’s figure** | + + +```python +from utilsforecast.data import generate_series +``` + + +```python +level = [80, 95] +series = generate_series(4, freq='D', equal_ends=True, with_trend=True, n_models=2, level=level) +test_pd = series.groupby('unique_id', observed=True).tail(10).copy() +train_pd = series.drop(test_pd.index) +``` + + +```python +plt.style.use('ggplot') +fig = plot_series( + train_pd, + forecasts_df=test_pd, + ids=[0, 3], + plot_random=False, + level=level, + max_insample_length=50, + engine='matplotlib', + plot_anomalies=True, +) +fig.savefig('imgs/plotting.png', bbox_inches='tight') +``` + + + diff --git a/utilsforecast/preprocessing.html.mdx b/utilsforecast/preprocessing.html.mdx new file mode 100644 index 00000000..d00ba12e --- /dev/null +++ b/utilsforecast/preprocessing.html.mdx @@ -0,0 +1,387 @@ +--- +description: Utilities for processing data before training/analysis +output-file: preprocessing.html +title: Preprocessing +--- + + +------------------------------------------------------------------------ + +source + +### id_time_grid + +> ``` text +> id_time_grid (df:~DFType, freq:Union[str,int], +> start:Union[str,int,datetime.date,datetime.datetime]='per_s +> erie', end:Union[str,int,datetime.date,datetime.datetime]=' +> global', id_col:str='unique_id', time_col:str='ds') +> ``` + +*Generate all expected combiations of ids and times.* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input data | +| freq | Union | | Series’ frequency | +| start | Union | per_serie | Initial timestamp for the series.
\* ‘per_serie’ uses each serie’s first timestamp
\* ‘global’ uses the first timestamp seen in the data
\* Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) | +| end | Union | global | Initial timestamp for the series.
\* ‘per_serie’ uses each serie’s last timestamp
\* ‘global’ uses the last timestamp seen in the data
\* Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestamp. | +| **Returns** | **DFType** | | **Dataframe with expected ids and times.** | + +------------------------------------------------------------------------ + +source + +### fill_gaps + +> ``` text +> fill_gaps (df:~DFType, freq:Union[str,int], +> start:Union[str,int,datetime.date,datetime.datetime]='per_seri +> e', +> end:Union[str,int,datetime.date,datetime.datetime]='global', +> id_col:str='unique_id', time_col:str='ds') +> ``` + +*Enforce start and end datetimes for dataframe.* + +| | **Type** | **Default** | **Details** | +|------|------------------|-------------------------|-------------------------| +| df | DFType | | Input data | +| freq | Union | | Series’ frequency | +| start | Union | per_serie | Initial timestamp for the series.
\* ‘per_serie’ uses each serie’s first timestamp
\* ‘global’ uses the first timestamp seen in the data
\* Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) | +| end | Union | global | Initial timestamp for the series.
\* ‘per_serie’ uses each serie’s last timestamp
\* ‘global’ uses the last timestamp seen in the data
\* Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestamp. | +| **Returns** | **DFType** | | **Dataframe with gaps filled.** | + + +```python +df = pd.DataFrame( + { + 'unique_id': [0, 0, 0, 1, 1], + 'ds': pd.to_datetime(['2020', '2021', '2023', '2021', '2022']), + 'y': np.arange(5), + } +) +df +``` + +| | unique_id | ds | y | +|-----|-----------|------------|-----| +| 0 | 0 | 2020-01-01 | 0 | +| 1 | 0 | 2021-01-01 | 1 | +| 2 | 0 | 2023-01-01 | 2 | +| 3 | 1 | 2021-01-01 | 3 | +| 4 | 1 | 2022-01-01 | 4 | + +The default functionality is taking the current starts and only +extending the end date to be the same for all series. + + +```python +fill_gaps( + df, + freq='YS', +) +``` + +| | unique_id | ds | y | +|-----|-----------|------------|-----| +| 0 | 0 | 2020-01-01 | 0.0 | +| 1 | 0 | 2021-01-01 | 1.0 | +| 2 | 0 | 2022-01-01 | NaN | +| 3 | 0 | 2023-01-01 | 2.0 | +| 4 | 1 | 2021-01-01 | 3.0 | +| 5 | 1 | 2022-01-01 | 4.0 | +| 6 | 1 | 2023-01-01 | NaN | + +We can also specify `end='per_serie'` to only fill possible gaps within +each serie. + + +```python +fill_gaps( + df, + freq='YS', + end='per_serie', +) +``` + +| | unique_id | ds | y | +|-----|-----------|------------|-----| +| 0 | 0 | 2020-01-01 | 0.0 | +| 1 | 0 | 2021-01-01 | 1.0 | +| 2 | 0 | 2022-01-01 | NaN | +| 3 | 0 | 2023-01-01 | 2.0 | +| 4 | 1 | 2021-01-01 | 3.0 | +| 5 | 1 | 2022-01-01 | 4.0 | + +We can also specify an end date in the future. + + +```python +fill_gaps( + df, + freq='YS', + end='2024', +) +``` + +| | unique_id | ds | y | +|-----|-----------|------------|-----| +| 0 | 0 | 2020-01-01 | 0.0 | +| 1 | 0 | 2021-01-01 | 1.0 | +| 2 | 0 | 2022-01-01 | NaN | +| 3 | 0 | 2023-01-01 | 2.0 | +| 4 | 0 | 2024-01-01 | NaN | +| 5 | 1 | 2021-01-01 | 3.0 | +| 6 | 1 | 2022-01-01 | 4.0 | +| 7 | 1 | 2023-01-01 | NaN | +| 8 | 1 | 2024-01-01 | NaN | + +We can set all series to start at the same time. + + +```python +fill_gaps( + df, + freq='YS', + start='global' +) +``` + +| | unique_id | ds | y | +|-----|-----------|------------|-----| +| 0 | 0 | 2020-01-01 | 0.0 | +| 1 | 0 | 2021-01-01 | 1.0 | +| 2 | 0 | 2022-01-01 | NaN | +| 3 | 0 | 2023-01-01 | 2.0 | +| 4 | 1 | 2020-01-01 | NaN | +| 5 | 1 | 2021-01-01 | 3.0 | +| 6 | 1 | 2022-01-01 | 4.0 | +| 7 | 1 | 2023-01-01 | NaN | + +We can also set a common start date for all series (which can be earlier +than their current starts). + + +```python +fill_gaps( + df, + freq='YS', + start='2019', +) +``` + +| | unique_id | ds | y | +|-----|-----------|------------|-----| +| 0 | 0 | 2019-01-01 | NaN | +| 1 | 0 | 2020-01-01 | 0.0 | +| 2 | 0 | 2021-01-01 | 1.0 | +| 3 | 0 | 2022-01-01 | NaN | +| 4 | 0 | 2023-01-01 | 2.0 | +| 5 | 1 | 2019-01-01 | NaN | +| 6 | 1 | 2020-01-01 | NaN | +| 7 | 1 | 2021-01-01 | 3.0 | +| 8 | 1 | 2022-01-01 | 4.0 | +| 9 | 1 | 2023-01-01 | NaN | + +In case the times are integers the frequency, start and end must also be +integers. + + +```python +df = pd.DataFrame( + { + 'unique_id': [0, 0, 0, 1, 1], + 'ds': [2020, 2021, 2023, 2021, 2022], + 'y': np.arange(5), + } +) +df +``` + +| | unique_id | ds | y | +|-----|-----------|------|-----| +| 0 | 0 | 2020 | 0 | +| 1 | 0 | 2021 | 1 | +| 2 | 0 | 2023 | 2 | +| 3 | 1 | 2021 | 3 | +| 4 | 1 | 2022 | 4 | + + +```python +fill_gaps( + df, + freq=1, + start=2019, + end=2024, +) +``` + +| | unique_id | ds | y | +|-----|-----------|------|-----| +| 0 | 0 | 2019 | NaN | +| 1 | 0 | 2020 | 0.0 | +| 2 | 0 | 2021 | 1.0 | +| 3 | 0 | 2022 | NaN | +| 4 | 0 | 2023 | 2.0 | +| 5 | 0 | 2024 | NaN | +| 6 | 1 | 2019 | NaN | +| 7 | 1 | 2020 | NaN | +| 8 | 1 | 2021 | 3.0 | +| 9 | 1 | 2022 | 4.0 | +| 10 | 1 | 2023 | NaN | +| 11 | 1 | 2024 | NaN | + +The function also accepts polars dataframes + + +```python +df = pl.DataFrame( + { + 'unique_id': [0, 0, 0, 1, 1], + 'ds': [ + datetime(2020, 1, 1), datetime(2022, 1, 1), datetime(2023, 1, 1), + datetime(2021, 1, 1), datetime(2022, 1, 1)], + 'y': np.arange(5), + } +) +df +``` + +| unique_id | ds | y | +|-----------|---------------------|-----| +| i64 | datetime\[μs\] | i64 | +| 0 | 2020-01-01 00:00:00 | 0 | +| 0 | 2022-01-01 00:00:00 | 1 | +| 0 | 2023-01-01 00:00:00 | 2 | +| 1 | 2021-01-01 00:00:00 | 3 | +| 1 | 2022-01-01 00:00:00 | 4 | + + +```python +polars_ms = fill_gaps( + df.with_columns(pl.col('ds').cast(pl.Datetime(time_unit='ms'))), + freq='1y', + start=datetime(2019, 1, 1), + end=datetime(2024, 1, 1), +) +assert polars_ms.schema['ds'].time_unit == 'ms' +polars_ms +``` + +| unique_id | ds | y | +|-----------|---------------------|------| +| i64 | datetime\[ms\] | i64 | +| 0 | 2019-01-01 00:00:00 | null | +| 0 | 2020-01-01 00:00:00 | 0 | +| 0 | 2021-01-01 00:00:00 | null | +| 0 | 2022-01-01 00:00:00 | 1 | +| 0 | 2023-01-01 00:00:00 | 2 | +| … | … | … | +| 1 | 2020-01-01 00:00:00 | null | +| 1 | 2021-01-01 00:00:00 | 3 | +| 1 | 2022-01-01 00:00:00 | 4 | +| 1 | 2023-01-01 00:00:00 | null | +| 1 | 2024-01-01 00:00:00 | null | + + +```python +df = pl.DataFrame( + { + 'unique_id': [0, 0, 0, 1, 1], + 'ds': [ + date(2020, 1, 1), date(2022, 1, 1), date(2023, 1, 1), + date(2021, 1, 1), date(2022, 1, 1)], + 'y': np.arange(5), + } +) +df +``` + +| unique_id | ds | y | +|-----------|------------|-----| +| i64 | date | i64 | +| 0 | 2020-01-01 | 0 | +| 0 | 2022-01-01 | 1 | +| 0 | 2023-01-01 | 2 | +| 1 | 2021-01-01 | 3 | +| 1 | 2022-01-01 | 4 | + + +```python +fill_gaps( + df, + freq='1y', + start=date(2020, 1, 1), + end=date(2024, 1, 1), +) +``` + +| unique_id | ds | y | +|-----------|------------|------| +| i64 | date | i64 | +| 0 | 2020-01-01 | 0 | +| 0 | 2021-01-01 | null | +| 0 | 2022-01-01 | 1 | +| 0 | 2023-01-01 | 2 | +| 0 | 2024-01-01 | null | +| 1 | 2020-01-01 | null | +| 1 | 2021-01-01 | 3 | +| 1 | 2022-01-01 | 4 | +| 1 | 2023-01-01 | null | +| 1 | 2024-01-01 | null | + + +```python +df = pl.DataFrame( + { + 'unique_id': [0, 0, 0, 1, 1], + 'ds': [2020, 2021, 2023, 2021, 2022], + 'y': np.arange(5), + } +) +df +``` + +| unique_id | ds | y | +|-----------|------|-----| +| i64 | i64 | i64 | +| 0 | 2020 | 0 | +| 0 | 2021 | 1 | +| 0 | 2023 | 2 | +| 1 | 2021 | 3 | +| 1 | 2022 | 4 | + + +```python +fill_gaps( + df, + freq=1, + start=2019, + end=2024, +) +``` + +| unique_id | ds | y | +|-----------|------|------| +| i64 | i64 | i64 | +| 0 | 2019 | null | +| 0 | 2020 | 0 | +| 0 | 2021 | 1 | +| 0 | 2022 | null | +| 0 | 2023 | 2 | +| … | … | … | +| 1 | 2020 | null | +| 1 | 2021 | 3 | +| 1 | 2022 | 4 | +| 1 | 2023 | null | +| 1 | 2024 | null | + diff --git a/utilsforecast/processing.mdx b/utilsforecast/processing.mdx new file mode 100644 index 00000000..05a98903 --- /dev/null +++ b/utilsforecast/processing.mdx @@ -0,0 +1,1146 @@ + +```python +import datetime +from datetime import datetime as dt + +from fastcore.test import test_eq, test_fail +from nbdev import show_doc + +from utilsforecast.compat import POLARS_INSTALLED +from utilsforecast.data import generate_series +``` + + +```python +import polars.testing +``` + +------------------------------------------------------------------------ + +source + +### to_numpy + +> ``` text +> to_numpy +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.Da +> taFrame]) +> ``` + +------------------------------------------------------------------------ + +source + +### counts_by_id + +> ``` text +> counts_by_id +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.fram +> e.DataFrame], id_col:str) +> ``` + +------------------------------------------------------------------------ + +source + +### maybe_compute_sort_indices + +> ``` text +> maybe_compute_sort_indices +> (df:Union[pandas.core.frame.DataFrame,polars. +> dataframe.frame.DataFrame], id_col:str, +> time_col:str) +> ``` + +*Compute indices that would sort the dataframe* + +| | **Type** | **Details** | +|--------|---------------------------|-------------------------------------| +| df | Union | Input dataframe with id, times and target values. | +| id_col | str | | +| time_col | str | | +| **Returns** | **Optional** | **Array with indices to sort the dataframe or None if it’s already sorted.** | + +------------------------------------------------------------------------ + +source + +### assign_columns + +> ``` text +> assign_columns +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.fr +> ame.DataFrame], names:Union[str,List[str]], values:Union[ +> numpy.ndarray,pandas.core.series.Series,polars.series.ser +> ies.Series,List[float]]) +> ``` + + +```python +engines = ['pandas'] +if POLARS_INSTALLED: + engines.append('polars') +``` + + +```python +for engine in engines: + series = generate_series(2, engine=engine) + x = np.random.rand(series.shape[0]) + series = assign_columns(series, 'x', x) + series = assign_columns(series, ['y', 'z'], np.vstack([x, x]).T) + series = assign_columns(series, 'ones', 1) + series = assign_columns(series, 'zeros', np.zeros(series.shape[0])) + series = assign_columns(series, 'as', 'a') + series = assign_columns(series, 'bs', series.shape[0] * ['b']) + np.testing.assert_allclose( + series[['x', 'y', 'z']], + np.vstack([x, x, x]).T + ) + np.testing.assert_equal(series['ones'], np.ones(series.shape[0])) + np.testing.assert_equal(series['as'], np.full(series.shape[0], 'a')) + np.testing.assert_equal(series['bs'], np.full(series.shape[0], 'b')) +``` + +------------------------------------------------------------------------ + +source + +### drop_columns + +> ``` text +> drop_columns +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.fram +> e.DataFrame], columns:Union[str,List[str]]) +> ``` + +------------------------------------------------------------------------ + +source + +### take_rows + +> ``` text +> take_rows (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.Da +> taFrame,pandas.core.series.Series,polars.series.series.Series, +> numpy.ndarray], idxs:numpy.ndarray) +> ``` + + +```python +for engine in engines: + series = generate_series(2, engine=engine) + subset = take_rows(series, np.array([0, 2])) + assert subset.shape[0] == 2 +``` + +------------------------------------------------------------------------ + +source + +### filter_with_mask + +> ``` text +> filter_with_mask (df:Union[pandas.core.series.Series,polars.series.series +> .Series,pandas.core.frame.DataFrame,polars.dataframe.fr +> ame.DataFrame,pandas.core.indexes.base.Index,numpy.ndar +> ray], mask:Union[numpy.ndarray,pandas.core.series.Serie +> s,polars.series.series.Series]) +> ``` + +------------------------------------------------------------------------ + +source + +### is_nan + +> ``` text +> is_nan (s:Union[pandas.core.series.Series,polars.series.series.Series]) +> ``` + + +```python +np.testing.assert_equal( + is_nan(pd.Series([np.nan, 1.0, None])).to_numpy(), + np.array([True, False, True]), +) +if POLARS_INSTALLED: + np.testing.assert_equal( + is_nan(pl.Series([np.nan, 1.0, None])).to_numpy(), + np.array([True, False, None]), + ) +``` + +------------------------------------------------------------------------ + +source + +### is_none + +> ``` text +> is_none (s:Union[pandas.core.series.Series,polars.series.series.Series]) +> ``` + + +```python +np.testing.assert_equal( + is_none(pd.Series([np.nan, 1.0, None])).to_numpy(), + np.array([True, False, True]), +) +if POLARS_INSTALLED: + np.testing.assert_equal( + is_none(pl.Series([np.nan, 1.0, None])).to_numpy(), + np.array([False, False, True]), + ) +``` + +------------------------------------------------------------------------ + +source + +### is_nan_or_none + +> ``` text +> is_nan_or_none +> (s:Union[pandas.core.series.Series,polars.series.series.S +> eries]) +> ``` + + +```python +np.testing.assert_equal( + is_nan_or_none(pd.Series([np.nan, 1.0, None])).to_numpy(), + np.array([True, False, True]), +) +if POLARS_INSTALLED: + np.testing.assert_equal( + is_nan_or_none(pl.Series([np.nan, 1.0, None])).to_numpy(), + np.array([True, False, True]), + ) +``` + +------------------------------------------------------------------------ + +source + +### match_if_categorical + +> ``` text +> match_if_categorical (s1:Union[pandas.core.series.Series,polars.series.se +> ries.Series,pandas.core.indexes.base.Index], s2:Uni +> on[pandas.core.series.Series,polars.series.series.S +> eries]) +> ``` + +------------------------------------------------------------------------ + +source + +### vertical_concat + +> ``` text +> vertical_concat (dfs:List[Union[pandas.core.frame.DataFrame,polars.datafr +> ame.frame.DataFrame,pandas.core.series.Series,polars.ser +> ies.series.Series]], match_categories:bool=True) +> ``` + + +```python +df1 = pd.DataFrame({'x': ['a', 'b', 'c']}, dtype='category') +df2 = pd.DataFrame({'x': ['f', 'b', 'a']}, dtype='category') +pd.testing.assert_series_equal( + vertical_concat([df1,df2])['x'], + pd.Series(['a', 'b', 'c', 'f', 'b', 'a'], name='x', dtype=pd.CategoricalDtype(categories=['a', 'b', 'c', 'f'])) +) +``` + + +```python +df1 = pl.DataFrame({'x': ['a', 'b', 'c']}, schema={'x': pl.Categorical}) +df2 = pl.DataFrame({'x': ['f', 'b', 'a']}, schema={'x': pl.Categorical}) +out = vertical_concat([df1,df2])['x'] +assert out.equals(pl.Series('x', ['a', 'b', 'c', 'f', 'b', 'a'])) +assert out.to_physical().equals(pl.Series('x', [0, 1, 2, 3, 1, 0])) +assert out.cat.get_categories().equals( + pl.Series('x', ['a', 'b', 'c', 'f']) +) +``` + + +```python +for engine in engines: + series = generate_series(2, engine=engine) + doubled = vertical_concat([series, series]) + assert doubled.shape[0] == 2 * series.shape[0] +``` + +------------------------------------------------------------------------ + +source + +### horizontal_concat + +> ``` text +> horizontal_concat (dfs:List[Union[pandas.core.frame.DataFrame,polars.data +> frame.frame.DataFrame]]) +> ``` + + +```python +for engine in engines: + series = generate_series(2, engine=engine) + renamer = {c: f'{c}_2' for c in series.columns} + if engine == 'pandas': + series2 = series.rename(columns=renamer) + else: + series2 = series.rename(renamer) + doubled = horizontal_concat([series, series2]) + assert doubled.shape[1] == 2 * series.shape[1] +``` + +------------------------------------------------------------------------ + +source + +### copy_if_pandas + +> ``` text +> copy_if_pandas +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.fr +> ame.DataFrame], deep:bool=False) +> ``` + +------------------------------------------------------------------------ + +source + +### join + +> ``` text +> join (df1:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.DataFr +> ame,pandas.core.series.Series,polars.series.series.Series], df2:Uni +> on[pandas.core.frame.DataFrame,polars.dataframe.frame.DataFrame,pan +> das.core.series.Series,polars.series.series.Series], +> on:Union[str,List[str]], how:str='inner') +> ``` + +------------------------------------------------------------------------ + +source + +### drop_index_if_pandas + +> ``` text +> drop_index_if_pandas +> (df:Union[pandas.core.frame.DataFrame,polars.datafr +> ame.frame.DataFrame]) +> ``` + +------------------------------------------------------------------------ + +source + +### rename + +> ``` text +> rename +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.Data +> Frame], mapping:Dict[str,str]) +> ``` + +------------------------------------------------------------------------ + +source + +### sort + +> ``` text +> sort +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.DataFr +> ame], by:Union[str,List[str],NoneType]=None) +> ``` + + +```python +pd.testing.assert_frame_equal( + sort(pd.DataFrame({'x': [3, 1, 2]}), 'x'), + pd.DataFrame({'x': [1, 2, 3]}) +) +pd.testing.assert_frame_equal( + sort(pd.DataFrame({'x': [3, 1, 2]}), ['x']), + pd.DataFrame({'x': [1, 2, 3]}) +) +pd.testing.assert_series_equal( + sort(pd.Series([3, 1, 2])), + pd.Series([1, 2, 3]) +) +pd.testing.assert_index_equal( + sort(pd.Index([3, 1, 2])), + pd.Index([1, 2, 3]) +) +``` + + +```python +pl.testing.assert_frame_equal( + sort(pl.DataFrame({'x': [3, 1, 2]}), 'x'), + pl.DataFrame({'x': [1, 2, 3]}), +) +pl.testing.assert_frame_equal( + sort(pl.DataFrame({'x': [3, 1, 2]}), ['x']), + pl.DataFrame({'x': [1, 2, 3]}), +) +pl.testing.assert_series_equal( + sort(pl.Series('x', [3, 1, 2])), + pl.Series('x', [1, 2, 3]) +) +``` + + +```python +test_eq(_multiply_pl_freq('1d', 4), '4d') +test_eq(_multiply_pl_freq('2d', 4), '8d') +pl.testing.assert_series_equal( + _multiply_pl_freq('1d', pl_Series([1, 2])), + pl_Series(['1d', '2d']), +) +pl.testing.assert_series_equal( + _multiply_pl_freq('4m', pl_Series([2, 4])), + pl_Series(['8m', '16m']), +) +``` + +------------------------------------------------------------------------ + +source + +### offset_times + +> ``` text +> offset_times (times:Union[pandas.core.series.Series,polars.series.series. +> Series,pandas.core.indexes.base.Index], +> freq:Union[int,str,pandas._libs.tslibs.offsets.BaseOffset], +> n:Union[int,numpy.ndarray]) +> ``` + + +```python +pd.testing.assert_index_equal( + offset_times(pd.to_datetime(['2020-01-31', '2020-02-29', '2020-03-31']), pd.offsets.MonthEnd(), 1), + pd.Index(pd.to_datetime(['2020-02-29', '2020-03-31', '2020-04-30'])), +) +pd.testing.assert_index_equal( + offset_times(pd.to_datetime(['2020-01-01', '2020-02-01', '2020-03-01']), pd.offsets.MonthBegin(), 1), + pd.Index(pd.to_datetime(['2020-02-01', '2020-03-01', '2020-04-01'])), +) +``` + + +```python +pl.testing.assert_series_equal( + offset_times(pl_Series([dt(2020, 1, 31), dt(2020, 2, 28), dt(2020, 3, 31)]), '1mo', 1), + pl_Series([dt(2020, 2, 29), dt(2020, 3, 28), dt(2020, 4, 30)]), +) +pl.testing.assert_series_equal( + offset_times(pl_Series([dt(2020, 1, 31), dt(2020, 2, 29), dt(2020, 3, 31)]), '1mo', 1), + pl_Series([dt(2020, 2, 29), dt(2020, 3, 31), dt(2020, 4, 30)]), +) +``` + +------------------------------------------------------------------------ + +source + +### offset_dates + +> ``` text +> offset_dates (dates:Union[pandas.core.series.Series,polars.series.series. +> Series,pandas.core.indexes.base.Index], +> freq:Union[int,str,pandas._libs.tslibs.offsets.BaseOffset], +> n:Union[int,pandas.core.series.Series,polars.series.series. +> Series]) +> ``` + +------------------------------------------------------------------------ + +source + +### time_ranges + +> ``` text +> time_ranges (starts:Union[pandas.core.series.Series,polars.series.series. +> Series,pandas.core.indexes.base.Index], +> freq:Union[int,str,pandas._libs.tslibs.offsets.BaseOffset], +> periods:int) +> ``` + + +```python +# datetimes +dates = pd.to_datetime(['2000-01-01', '2010-10-10']) +pd.testing.assert_series_equal( + time_ranges(dates, freq='D', periods=3), + pd.Series(pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03', '2010-10-10', '2010-10-11', '2010-10-12'])) +) +pd.testing.assert_series_equal( + time_ranges(dates, freq='2D', periods=3), + pd.Series(pd.to_datetime(['2000-01-01', '2000-01-03', '2000-01-05', '2010-10-10', '2010-10-12', '2010-10-14'])) +) +pd.testing.assert_series_equal( + time_ranges(dates, freq='4D', periods=3), + pd.Series(pd.to_datetime(['2000-01-01', '2000-01-05', '2000-01-09', '2010-10-10', '2010-10-14', '2010-10-18'])) +) +pd.testing.assert_series_equal( + time_ranges(pd.to_datetime(['2000-01-01', '2010-10-01']), freq=2 * pd.offsets.MonthBegin(), periods=2), + pd.Series(pd.to_datetime(['2000-01-01', '2000-03-01', '2010-10-01', '2010-12-01'])) +) +pd.testing.assert_series_equal( + time_ranges(pd.to_datetime(['2000-01-01', '2010-01-01']).tz_localize('US/Eastern'), freq=2 * pd.offsets.YearBegin(), periods=2), + pd.Series(pd.to_datetime(['2000-01-01', '2002-01-01', '2010-01-01', '2012-01-01']).tz_localize('US/Eastern')) +) +pd.testing.assert_series_equal( + time_ranges(pd.to_datetime(['2000-12-31', '2010-12-31']), freq=2 * pd.offsets.YearEnd(), periods=2), + pd.Series(pd.to_datetime(['2000-12-31', '2002-12-31', '2010-12-31', '2012-12-31'])) +) +# ints +dates = pd.Series([1, 10]) +pd.testing.assert_series_equal( + time_ranges(dates, freq=1, periods=3), + pd.Series([1, 2, 3, 10, 11, 12]) +) +pd.testing.assert_series_equal( + time_ranges(dates, freq=2, periods=3), + pd.Series([1, 3, 5, 10, 12, 14]) +) +pd.testing.assert_series_equal( + time_ranges(dates, freq=4, periods=3), + pd.Series([1, 5, 9, 10, 14, 18]) +) +``` + + +```python +# datetimes +dates = pl.Series([dt(2000, 1, 1), dt(2010, 10, 10)]) +pl.testing.assert_series_equal( + time_ranges(dates, freq='1d', periods=3), + pl.Series([dt(2000, 1, 1), dt(2000, 1, 2), dt(2000, 1, 3), dt(2010, 10, 10), dt(2010, 10, 11), dt(2010, 10, 12)]) +) +pl.testing.assert_series_equal( + time_ranges(dates, freq='2d', periods=3), + pl.Series([dt(2000, 1, 1), dt(2000, 1, 3), dt(2000, 1, 5), dt(2010, 10, 10), dt(2010, 10, 12), dt(2010, 10, 14)]) +) +pl.testing.assert_series_equal( + time_ranges(dates, freq='4d', periods=3), + pl.Series([dt(2000, 1, 1), dt(2000, 1, 5), dt(2000, 1, 9), dt(2010, 10, 10), dt(2010, 10, 14), dt(2010, 10, 18)]) +) +pl.testing.assert_series_equal( + time_ranges(pl.Series([dt(2010, 2, 28), dt(2000, 1, 31)]), '1mo', 3), + pl.Series([dt(2010, 2, 28), dt(2010, 3, 31), dt(2010, 4, 30), dt(2000, 1, 31), dt(2000, 2, 29), dt(2000, 3, 31)]) +) +# dates +dates = pl.Series([datetime.date(2000, 1, 1), datetime.date(2010, 10, 10)]) +pl.testing.assert_series_equal( + time_ranges(dates, freq='1d', periods=2), + pl.Series([ + datetime.date(2000, 1, 1), datetime.date(2000, 1, 2), + datetime.date(2010, 10, 10), datetime.date(2010, 10, 11), + ]) +) +# ints +dates = pl.Series([1, 10]) +pl.testing.assert_series_equal( + time_ranges(dates, freq=1, periods=3), + pl.Series([1, 2, 3, 10, 11, 12]), +) +pl.testing.assert_series_equal( + time_ranges(dates, freq=2, periods=3), + pl.Series([1, 3, 5, 10, 12, 14]), +) +pl.testing.assert_series_equal( + time_ranges(dates, freq=4, periods=3), + pl.Series([1, 5, 9, 10, 14, 18]), +) +``` + +------------------------------------------------------------------------ + +source + +### repeat + +> ``` text +> repeat (s:Union[pandas.core.series.Series,polars.series.series.Series,pan +> das.core.indexes.base.Index,numpy.ndarray], n:Union[int,numpy.nda +> rray,pandas.core.series.Series,polars.series.series.Series]) +> ``` + + +```python +pd.testing.assert_index_equal( + repeat(pd.CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c']), 2), + pd.CategoricalIndex(['a', 'a', 'b', 'b', 'c', 'c'], categories=['a', 'b', 'c']) +) +pd.testing.assert_series_equal( + repeat(pd.Series([1, 2]), 2), + pd.Series([1, 1, 2, 2]) +) +pd.testing.assert_series_equal( + repeat(pd.Series([1, 2]), pd.Series([2, 3])), + pd.Series([1, 1, 2, 2, 2]), +) +np.testing.assert_array_equal( + repeat(np.array([np.datetime64('2000-01-01'), np.datetime64('2010-10-10')]), 2), + np.array([ + np.datetime64('2000-01-01'), np.datetime64('2000-01-01'), + np.datetime64('2010-10-10'), np.datetime64('2010-10-10') + ]) +) +np.testing.assert_array_equal( + repeat(np.array([1, 2]), np.array([2, 3])), + np.array([1, 1, 2, 2, 2]), +) +``` + + +```python +s = pl.Series(['a', 'b', 'c'], dtype=pl.Categorical) +pl.testing.assert_series_equal( + repeat(s, 2), + pl.concat([s, s]).sort() +) +pl.testing.assert_series_equal( + repeat(pl.Series([2, 4]), 2), + pl.Series([2, 2, 4, 4]) +) +pl.testing.assert_series_equal( + repeat(pl.Series([1, 2]), np.array([2, 3])), + pl.Series([1, 1, 2, 2, 2]), +) +``` + +------------------------------------------------------------------------ + +source + +### cv_times + +> ``` text +> cv_times (times:numpy.ndarray, uids:Union[pandas.core.series.Series,polar +> s.series.series.Series,pandas.core.indexes.base.Index], +> indptr:numpy.ndarray, h:int, test_size:int, step_size:int, +> id_col:str='unique_id', time_col:str='ds') +> ``` + + +```python +times = np.arange(51, dtype=np.int64) +uids = pd.Series(['id_0']) +indptr = np.array([0, 51]) +h = 3 +test_size = 5 +actual = cv_times( + times=times, + uids=uids, + indptr=indptr, + h=h, + test_size=test_size, + step_size=1, +) +expected = pd.DataFrame({ + 'unique_id': 9 * ['id_0'], + 'ds': np.hstack([ + [46, 47, 48], + [47, 48, 49], + [48, 49, 50] + ], dtype=np.int64), + 'cutoff': np.repeat(np.array([45, 46, 47], dtype=np.int64), h), +}) +pd.testing.assert_frame_equal(actual, expected) + +# step_size=2 +actual = cv_times( + times=times, + uids=uids, + indptr=indptr, + h=h, + test_size=test_size, + step_size=2, +) +expected = pd.DataFrame({ + 'unique_id': 6 * ['id_0'], + 'ds': np.hstack([ + [46, 47, 48], + [48, 49, 50] + ], dtype=np.int64), + 'cutoff': np.repeat(np.array([45, 47], dtype=np.int64), h) +}) +pd.testing.assert_frame_equal(actual, expected) +``` + +------------------------------------------------------------------------ + +source + +### group_by + +> ``` text +> group_by (df:Union[pandas.core.series.Series,polars.series.series.Series, +> pandas.core.frame.DataFrame,polars.dataframe.frame.DataFrame], +> by, maintain_order=False) +> ``` + +------------------------------------------------------------------------ + +source + +### group_by_agg + +> ``` text +> group_by_agg +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.fram +> e.DataFrame], by, aggs, maintain_order=False) +> ``` + + +```python +pd.testing.assert_frame_equal( + group_by_agg(pd.DataFrame({'x': [1, 1, 2], 'y': [1, 1, 1]}), 'x', {'y': 'sum'}), + pd.DataFrame({'x': [1, 2], 'y': [2, 1]}) +) +``` + + +```python +pd.testing.assert_frame_equal( + group_by_agg(pl.DataFrame({'x': [1, 1, 2], 'y': [1, 1, 1]}), 'x', {'y': 'sum'}, maintain_order=True).to_pandas(), + pd.DataFrame({'x': [1, 2], 'y': [2, 1]}) +) +``` + +------------------------------------------------------------------------ + +source + +### is_in + +> ``` text +> is_in (s:Union[pandas.core.series.Series,polars.series.series.Series], +> collection) +> ``` + + +```python +np.testing.assert_equal(is_in(pd.Series([1, 2, 3]), [1]), np.array([True, False, False])) +``` + + +```python +np.testing.assert_equal(is_in(pl.Series([1, 2, 3]), [1]), np.array([True, False, False])) +``` + +------------------------------------------------------------------------ + +source + +### between + +> ``` text +> between (s:Union[pandas.core.series.Series,polars.series.series.Series], +> lower:Union[pandas.core.series.Series,polars.series.series.Serie +> s], upper:Union[pandas.core.series.Series,polars.series.series.S +> eries]) +> ``` + + +```python +np.testing.assert_equal( + between(pd.Series([1, 2, 3]), pd.Series([0, 1, 4]), pd.Series([4, 1, 2])), + np.array([True, False, False]), +) +``` + + +```python +np.testing.assert_equal( + between(pl.Series([1, 2, 3]), pl.Series([0, 1, 4]), pl.Series([4, 1, 2])), + np.array([True, False, False]), +) +``` + +------------------------------------------------------------------------ + +source + +### fill_null + +> ``` text +> fill_null +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.D +> ataFrame], mapping:Dict[str,Any]) +> ``` + + +```python +pd.testing.assert_frame_equal( + fill_null(pd.DataFrame({'x': [1, np.nan], 'y': [np.nan, 2]}), {'x': 2, 'y': 1}), + pd.DataFrame({'x': [1, 2], 'y': [1, 2]}, dtype='float64') +) +``` + + +```python +pl.testing.assert_frame_equal( + fill_null(pl.DataFrame({'x': [1, None], 'y': [None, 2]}), {'x': 2, 'y': 1}), + pl.DataFrame({'x': [1, 2], 'y': [1, 2]}) +) +``` + +------------------------------------------------------------------------ + +source + +### cast + +> ``` text +> cast (s:Union[pandas.core.series.Series,polars.series.series.Series], +> dtype:type) +> ``` + + +```python +pd.testing.assert_series_equal( + cast(pd.Series([1, 2, 3]), 'int16'), + pd.Series([1, 2, 3], dtype='int16') +) +``` + + +```python +pd.testing.assert_series_equal( + cast(pl.Series('x', [1, 2, 3]), pl.Int16).to_pandas(), + pd.Series([1, 2, 3], name='x', dtype='int16') +) +``` + +------------------------------------------------------------------------ + +source + +### value_cols_to_numpy + +> ``` text +> value_cols_to_numpy +> (df:Union[pandas.core.frame.DataFrame,polars.datafra +> me.frame.DataFrame], id_col:str, time_col:str, +> target_col:Optional[str]) +> ``` + +------------------------------------------------------------------------ + +source + +### make_future_dataframe + +> ``` text +> make_future_dataframe +> (uids:Union[pandas.core.series.Series,polars.serie +> s.series.Series], last_times:Union[pandas.core.ser +> ies.Series,polars.series.series.Series,pandas.core +> .indexes.base.Index], freq:Union[int,str,pandas._l +> ibs.tslibs.offsets.BaseOffset], h:int, +> id_col:str='unique_id', time_col:str='ds') +> ``` + + +```python +pd.testing.assert_frame_equal( + make_future_dataframe( + pd.Series([1, 2]), pd.to_datetime(['2000-01-01', '2010-10-10']), freq='D', h=2 + ), + pd.DataFrame({ + 'unique_id': [1, 1, 2, 2], + 'ds': pd.to_datetime(['2000-01-02', '2000-01-03', '2010-10-11', '2010-10-12']) + }) +) +``` + + +```python +pl.testing.assert_frame_equal( + make_future_dataframe( + pl.Series([1, 2]), + pl.Series([dt(2000, 1, 1), dt(2010, 10, 10)]), + freq='1d', + h=2, + id_col='uid', + time_col='dates', + ), + pl.DataFrame({ + 'uid': [1, 1, 2, 2], + 'dates': [dt(2000, 1, 2), dt(2000, 1, 3), dt(2010, 10, 11), dt(2010, 10, 12)] + }) +) +``` + +------------------------------------------------------------------------ + +source + +### anti_join + +> ``` text +> anti_join +> (df1:Union[pandas.core.frame.DataFrame,polars.dataframe.frame. +> DataFrame], df2:Union[pandas.core.frame.DataFrame,polars.dataf +> rame.frame.DataFrame], on:Union[str,List[str]]) +> ``` + + +```python +pd.testing.assert_frame_equal( + anti_join(pd.DataFrame({'x': [1, 2]}), pd.DataFrame({'x': [1]}), on='x'), + pd.DataFrame({'x': [2]}) +) +test_eq( + anti_join(pd.DataFrame({'x': [1]}), pd.DataFrame({'x': [1]}), on='x').shape[0], + 0, +) +``` + + +```python +pl.testing.assert_frame_equal( + anti_join(pl_DataFrame({'x': [1, 2]}), pl_DataFrame({'x': [1]}), on='x'), + pl_DataFrame({'x': [2]}) +) +test_eq( + anti_join(pl_DataFrame({'x': [1]}), pl_DataFrame({'x': [1]}), on='x').shape[0], + 0, +) +``` + +------------------------------------------------------------------------ + +source + +### ensure_sorted + +> ``` text +> ensure_sorted +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.fra +> me.DataFrame], id_col:str, time_col:str) +> ``` + +------------------------------------------------------------------------ + +source + +### process_df + +> ``` text +> process_df +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame. +> DataFrame], id_col:str, time_col:str, +> target_col:Optional[str]) +> ``` + +*Extract components from dataframe* + +| | **Type** | **Details** | +|--------|---------------------------|-------------------------------------| +| df | Union | Input dataframe with id, times and target values. | +| id_col | str | | +| time_col | str | | +| target_col | Optional | | +| **Returns** | **ProcessedDF** | **serie with the sorted unique ids present in the data.** | + +------------------------------------------------------------------------ + +source + +### ProcessedDF + +> ``` text +> ProcessedDF +> (uids:Union[pandas.core.series.Series,polars.series.series.S +> eries], last_times:numpy.ndarray, data:numpy.ndarray, +> indptr:numpy.ndarray, sort_idxs:Optional[numpy.ndarray]) +> ``` + +------------------------------------------------------------------------ + +source + +### DataFrameProcessor + +> ``` text +> DataFrameProcessor (id_col:str='unique_id', time_col:str='ds', +> target_col:str='y') +> ``` + +*Initialize self. See help(type(self)) for accurate signature.* + + +```python +static_features = ['static_0', 'static_1'] +``` + + +```python +for n_static_features in [0, 2]: + series_pd = generate_series(1_000, n_static_features=n_static_features, equal_ends=False, engine='pandas') + for i in range(n_static_features): + series_pd[f'static_{i}'] = series_pd[f'static_{i}'].map(lambda x: f'x_{x}').astype('category') + scrambled_series_pd = series_pd.sample(frac=1.0) + dfp = DataFrameProcessor('unique_id', 'ds', 'y') + uids, times, data, indptr, _ = dfp.process(scrambled_series_pd) + test_eq(times, series_pd.groupby('unique_id', observed=True)['ds'].max().values) + test_eq(uids, np.sort(series_pd['unique_id'].unique())) + for i in range(n_static_features): + series_pd[f'static_{i}'] = series_pd[f'static_{i}'].cat.codes + test_eq(data, series_pd[['y'] + static_features[:n_static_features]].to_numpy()) + test_eq(np.diff(indptr), series_pd.groupby('unique_id', observed=True).size().values) +``` + + +```python +for n_static_features in [0, 2]: + series_pl = generate_series(1_000, n_static_features=n_static_features, equal_ends=False, engine='polars') + scrambled_series_pl = series_pl.sample(fraction=1.0, shuffle=True) + dfp = DataFrameProcessor('unique_id', 'ds', 'y') + uids, times, data, indptr, _ = dfp.process(scrambled_series_pl) + grouped = group_by(series_pl, 'unique_id') + test_eq(times, grouped.agg(pl.col('ds').max()).sort('unique_id')['ds'].to_numpy()) + test_eq(uids, series_pl['unique_id'].unique().sort()) + test_eq(data, series_pl.select(pl.col(c).map_batches(lambda s: s.to_physical()) for c in ['y'] + static_features[:n_static_features]).to_numpy()) + test_eq(np.diff(indptr), grouped.count().sort('unique_id')['count'].to_numpy()) +``` + +------------------------------------------------------------------------ + +source + +### backtest_splits + +> ``` text +> backtest_splits +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.f +> rame.DataFrame], n_windows:int, h:int, id_col:str, +> time_col:str, freq:Union[int,str,pandas._libs.tslibs.off +> sets.BaseOffset], step_size:Optional[int]=None, +> input_size:Optional[int]=None, +> allow_partial_horizons:bool=False) +> ``` + +------------------------------------------------------------------------ + +source + +### add_insample_levels + +> ``` text +> add_insample_levels +> (df:Union[pandas.core.frame.DataFrame,polars.datafra +> me.frame.DataFrame], models:List[str], +> level:List[Union[int,float]], +> id_col:str='unique_id', target_col:str='y') +> ``` + + +```python +series = generate_series(100, n_models=2) +models = ['model0', 'model1'] +levels = [80, 95] +with_levels = add_insample_levels(series, models, levels) +for model in models: + for lvl in levels: + assert with_levels[f'{model}-lo-{lvl}'].lt(with_levels[f'{model}-hi-{lvl}']).all() +``` + + +```python +series_pl = generate_series(100, n_models=2, engine='polars') +with_levels_pl = add_insample_levels(series_pl, ['model0', 'model1'], [80, 95]) +pd.testing.assert_frame_equal( + with_levels.drop(columns='unique_id'), + with_levels_pl.to_pandas().drop(columns='unique_id') +) +``` + diff --git a/utilsforecast/validation.html.mdx b/utilsforecast/validation.html.mdx new file mode 100644 index 00000000..be619116 --- /dev/null +++ b/utilsforecast/validation.html.mdx @@ -0,0 +1,150 @@ +--- +description: Utilities to validate input data +output-file: validation.html +title: Validation +--- + + + +```python +import datetime + +from fastcore.test import test_eq, test_fail +``` + + +```python +import polars.testing +``` + +------------------------------------------------------------------------ + +source + +### ensure_shallow_copy + +> ``` text +> ensure_shallow_copy (df:pandas.core.frame.DataFrame) +> ``` + +------------------------------------------------------------------------ + +source + +### ensure_time_dtype + +> ``` text +> ensure_time_dtype (df:~DFType, time_col:str='ds') +> ``` + +*Make sure that `time_col` contains timestamps or integers. If it +contains strings, try to cast them as timestamps.* + + +```python +pd.testing.assert_frame_equal( + ensure_time_dtype(pd.DataFrame({'ds': ['2000-01-01']})), + pd.DataFrame({'ds': pd.to_datetime(['2000-01-01'])}) +) +df = pd.DataFrame({'ds': [1, 2]}) +assert df is ensure_time_dtype(df) +test_fail( + lambda: ensure_time_dtype(pd.DataFrame({'ds': ['2000-14-14']})), + contains='Please make sure that it contains valid timestamps', +) +``` + + +```python +pl.testing.assert_frame_equal( + ensure_time_dtype(pl.DataFrame({'ds': ['2000-01-01']})), + pl.DataFrame().with_columns(ds=pl.datetime(2000, 1, 1)) +) +df = pl.DataFrame({'ds': [1, 2]}) +assert df is ensure_time_dtype(df) +test_fail( + lambda: ensure_time_dtype(pl.DataFrame({'ds': ['hello']})), + contains='Please make sure that it contains valid timestamps', +) +``` + +------------------------------------------------------------------------ + +source + +### validate_format + +> ``` text +> validate_format +> (df:Union[pandas.core.frame.DataFrame,polars.dataframe.f +> rame.DataFrame], id_col:str='unique_id', +> time_col:str='ds', target_col:Optional[str]='y') +> ``` + +*Ensure DataFrame has expected format.* + +| | **Type** | **Default** | **Details** | +|-------------|----------|-------------|--------------------------------------------| +| df | Union | | DataFrame with time series in long format. | +| id_col | str | unique_id | Column that identifies each serie. | +| time_col | str | ds | Column that identifies each timestamp. | +| target_col | Optional | y | Column that contains the target. | +| **Returns** | **None** | | | + + +```python +import datetime + +from utilsforecast.compat import POLARS_INSTALLED, pl +from utilsforecast.data import generate_series +``` + + +```python +test_fail(lambda: validate_format(1), contains="got") +constructors = [pd.DataFrame] +if POLARS_INSTALLED: + constructors.append(pl.DataFrame) +for constructor in constructors: + df = constructor({'unique_id': [1]}) + test_fail(lambda: validate_format(df), contains="missing: ['ds', 'y']") + df = constructor({'unique_id': [1], 'time': ['x'], 'y': [1]}) + test_fail(lambda: validate_format(df, time_col='time'), contains="('time') should have either timestamps or integers") + for time in [1, datetime.datetime(2000, 1, 1)]: + df = constructor({'unique_id': [1], 'ds': [time], 'sales': ['x']}) + test_fail(lambda: validate_format(df, target_col='sales'), contains="('sales') should have a numeric data type") +``` + +------------------------------------------------------------------------ + +source + +### validate_freq + +> ``` text +> validate_freq +> (times:Union[pandas.core.series.Series,polars.series.serie +> s.Series], freq:Union[str,int]) +> ``` + + +```python +test_fail(lambda: validate_freq(pd.Series([1, 2]), 'D'), contains='provide a valid integer') +test_fail(lambda: validate_freq(pd.to_datetime(['2000-01-01']).to_series(), 1), contains='provide a valid pandas or polars offset') +``` + + +```python +test_fail(lambda: validate_freq(pl.Series([1, 2]), '1d'), contains='provide a valid integer') +test_fail(lambda: validate_freq(pl.Series([datetime.datetime(2000, 1, 1)]), 1), contains='provide a valid pandas or polars offset') +test_fail(lambda: validate_freq(pl.Series([datetime.datetime(2000, 1, 1)]), 'D'), contains='valid polars offset') +``` +