The goal of cesnet-tszoo project is to provide time series datasets with useful tools for preprocessing and reproducibility. Such as:
- API for downloading, configuring and loading various datasets (e.g. CESNET-TimeSeries24, CESNET-AGG23...), each with various sources and aggregations.
- Example of configuration options:
- Data can be split into train/val/test sets. Split can be done by time series or by time periods.
- Transforming of data with built-in transformers or with custom transformers.
- Handling missing values built-in fillers or with custom fillers.
- Applying custom handlers.
- Changing order of when are preprocesses applied/fitted
- Creation and import of benchmarks, for easy reproducibility of experiments.
- Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.
| Name | CESNET-TimeSeries24 | CESNET-AGG23 | Abilene | GÉANT | SDN | Telecom Italia | Network Operator KPIs |
|---|---|---|---|---|---|---|---|
| Published in | 2025 | 2023 | 2005 | 2005 | 2021 | 2015 | 2023 |
| Collection period | 9.10.2023 - 14.7.2024 | 25.2.2023 - 3.5.2023 | 2004 | 2005 | — | 2013–2014 | — |
| Collection duration | 40 weeks | 10 weeks | 6 months | 16 weeks | 4 days | 2 months | Multiple weeks |
| Aggregation window | 1 day, 1 hour, 10 min | 1 min | 5 min, 10 min, 1 hour, 1 day | 15 min, 1 hour, 1 day | 1 min, 10 min, 1 hour, 1 day | 10 min, 1 hour, 1 day | 5 min, 10 min, 1 hour, 1 day |
| Sources | CESNET3: Institutions, Institution subnets, IP addresses | CESNET2 | Abilene network | GÉANT network | Simulated SDN environment | Milan city cells (SMS, call, internet) | Network operator |
| Subsets | — | — | Matrix, Node2Node, Node | Matrix, Node2Node, Node | Matrix, Node2Node, Node | — | Downstream, Internet, Sessions, VPN |
| Cite | https://doi.org/10.1038/s41597-025-04603-x | https://doi.org/10.23919/CNSM59352.2023.10327823 | https://doi.org/10.1145/885651.781053 | https://dl.acm.org/doi/10.1145/1111322.1111341 | https://doi.org/10.1109/ICC42927.2021.9500331 | https://doi.org/10.1038/sdata.2015.55 | https://doi.org/10.5281/zenodo.8147768 |
| Source URL | https://zenodo.org/records/13382427 | https://zenodo.org/records/8053021 | https://www.cs.utexas.edu/~yzhang/research/AbileneTM | https://totem.info.ucl.ac.be/dataset.html | https://github.com/duchuyle108/SDN-TMprediction | https://dataverse.harvard.edu | https://doi.org/10.5281/zenodo.8147768 |
Install the package from pip with:
pip install cesnet-tszooor for editable install with:
pip install -e git+https://github.com/CESNET/cesnet-tszoo#egg=cesnet-tszooIf you use CESNET TS-Zoo, please cite our paper:
@misc{kures2025,
title={CESNET TS-Zoo: A Library for Reproducible Analysis of Network Traffic Time Series},
author={Milan Kureš and Josef Koumar and Karel Hynek},
booktitle={2025 21th International Conference on Network and Service Management (CNSM)},
year={2025}
}
For detailed examples refer to Tutorial notebooks
Using TimeBasedCesnetDataset dataset
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import TimeBasedConfig
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED)
config = TimeBasedConfig(
ts_ids=50, # number of randomly selected time series from dataset
train_time_period=range(0, 100),
val_time_period=range(100, 150),
test_time_period=range(150, 250),
features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()Time-based datasets are configured with TimeBasedConfig.
Using DisjointTimeBasedCesnetDataset dataset
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import DisjointTimeBasedConfig
dataset = CESNET_TimeSeries24.get_dataset("/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.DISJOINT_TIME_BASED)
config = DisjointTimeBasedConfig(
train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
train_time_period=range(0, 100),
val_time_period=range(100, 150),
test_time_period=range(150, 250),
features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()Disjoint-time-based datasets are configured with DisjointTimeBasedConfig.
Using SeriesBasedCesnetDataset dataset
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import SeriesBasedConfig
dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.SERIES_BASED)
config = SeriesBasedConfig(
time_period=range(0, 250),
train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()Series-based datasets are configured with SeriesBasedConfig.
Using load_benchmark
from cesnet_tszoo.benchmarks import load_benchmark
benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset()
train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()Loaded dataset can be one of the above.