GitHub - CESNET/cesnet-tszoo: CESNET Ts-Zoo is a toolkit for working with large time series network traffic datasets.

The goal of cesnet-tszoo project is to provide time series datasets with useful tools for preprocessing and reproducibility. Such as:

API for downloading, configuring and loading various datasets (e.g. CESNET-TimeSeries24, CESNET-AGG23...), each with various sources and aggregations.
Example of configuration options:
- Data can be split into train/val/test sets. Split can be done by time series or by time periods.
- Transforming of data with built-in transformers or with custom transformers.
- Handling missing values built-in fillers or with custom fillers.
- Applying custom handlers.
- Changing order of when are preprocesses applied/fitted
Creation and import of benchmarks, for easy reproducibility of experiments.
Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.

Datasets

Name	CESNET-TimeSeries24	CESNET-AGG23	Abilene	GÉANT	SDN	Telecom Italia	Network Operator KPIs
Published in	2025	2023	2005	2005	2021	2015	2023
Collection period	9.10.2023 - 14.7.2024	25.2.2023 - 3.5.2023	2004	2005	—	2013–2014	—
Collection duration	40 weeks	10 weeks	6 months	16 weeks	4 days	2 months	Multiple weeks
Aggregation window	1 day, 1 hour, 10 min	1 min	5 min, 10 min, 1 hour, 1 day	15 min, 1 hour, 1 day	1 min, 10 min, 1 hour, 1 day	10 min, 1 hour, 1 day	5 min, 10 min, 1 hour, 1 day
Sources	CESNET3: Institutions, Institution subnets, IP addresses	CESNET2	Abilene network	GÉANT network	Simulated SDN environment	Milan city cells (SMS, call, internet)	Network operator
Subsets	—	—	Matrix, Node2Node, Node	Matrix, Node2Node, Node	Matrix, Node2Node, Node	—	Downstream, Internet, Sessions, VPN
Cite	https://doi.org/10.1038/s41597-025-04603-x	https://doi.org/10.23919/CNSM59352.2023.10327823	https://doi.org/10.1145/885651.781053	https://dl.acm.org/doi/10.1145/1111322.1111341	https://doi.org/10.1109/ICC42927.2021.9500331	https://doi.org/10.1038/sdata.2015.55	https://doi.org/10.5281/zenodo.8147768
Source URL	https://zenodo.org/records/13382427	https://zenodo.org/records/8053021	https://www.cs.utexas.edu/~yzhang/research/AbileneTM	https://totem.info.ucl.ac.be/dataset.html	https://github.com/duchuyle108/SDN-TMprediction	https://dataverse.harvard.edu	https://doi.org/10.5281/zenodo.8147768

Installation

Install the package from pip with:

pip install cesnet-tszoo

or for editable install with:

pip install -e git+https://github.com/CESNET/cesnet-tszoo#egg=cesnet-tszoo

Citation

If you use CESNET TS-Zoo, please cite our paper:

@misc{kures2025,
    title={CESNET TS-Zoo: A Library for Reproducible Analysis of Network Traffic Time Series}, 
    author={Milan Kureš and Josef Koumar and Karel Hynek},
    booktitle={2025 21th International Conference on Network and Service Management (CNSM)}, 
    year={2025}
}

Examples

For detailed examples refer to Tutorial notebooks

Initialize dataset to create train, validation, and test dataframes

Using `TimeBasedCesnetDataset` dataset

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import TimeBasedConfig

dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED)
config = TimeBasedConfig(
    ts_ids=50, # number of randomly selected time series from dataset
    train_time_period=range(0, 100), 
    val_time_period=range(100, 150), 
    test_time_period=range(150, 250), 
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Time-based datasets are configured with TimeBasedConfig.

Using `DisjointTimeBasedCesnetDataset` dataset

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import DisjointTimeBasedConfig

dataset = CESNET_TimeSeries24.get_dataset("/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.DISJOINT_TIME_BASED)
config = DisjointTimeBasedConfig(
    train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
    val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
    test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
    train_time_period=range(0, 100), 
    val_time_period=range(100, 150), 
    test_time_period=range(150, 250), 
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Disjoint-time-based datasets are configured with DisjointTimeBasedConfig.

Using `SeriesBasedCesnetDataset` dataset

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import SeriesBasedConfig

dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.SERIES_BASED)
config = SeriesBasedConfig(
    time_period=range(0, 250), 
    train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
    val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
    test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Series-based datasets are configured with SeriesBasedConfig.

Using `load_benchmark`

from cesnet_tszoo.benchmarks import load_benchmark

benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset()

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Loaded dataset can be one of the above.

Name		Name	Last commit message	Last commit date
Latest commit History 389 Commits
.github/workflows		.github/workflows
cesnet_tszoo		cesnet_tszoo
docs		docs
scripts		scripts
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datasets

Installation

Citation

Examples

Initialize dataset to create train, validation, and test dataframes

Using `TimeBasedCesnetDataset` dataset

Using `DisjointTimeBasedCesnetDataset` dataset

Using `SeriesBasedCesnetDataset` dataset

Using `load_benchmark`

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Datasets

Installation

Citation

Examples

Initialize dataset to create train, validation, and test dataframes

Using TimeBasedCesnetDataset dataset

Using DisjointTimeBasedCesnetDataset dataset

Using SeriesBasedCesnetDataset dataset

Using load_benchmark

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Using `TimeBasedCesnetDataset` dataset

Using `DisjointTimeBasedCesnetDataset` dataset

Using `SeriesBasedCesnetDataset` dataset

Using `load_benchmark`

Packages