Skip to content

CESNET/cesnet-tszoo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

389 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Storage Status

The goal of cesnet-tszoo project is to provide time series datasets with useful tools for preprocessing and reproducibility. Such as:

  • API for downloading, configuring and loading various datasets (e.g. CESNET-TimeSeries24, CESNET-AGG23...), each with various sources and aggregations.
  • Example of configuration options:
    • Data can be split into train/val/test sets. Split can be done by time series or by time periods.
    • Transforming of data with built-in transformers or with custom transformers.
    • Handling missing values built-in fillers or with custom fillers.
    • Applying custom handlers.
    • Changing order of when are preprocesses applied/fitted
  • Creation and import of benchmarks, for easy reproducibility of experiments.
  • Creation and import of annotations. Can create annotations for specific time series, specific time or specific time in specific time series.

Datasets

Name CESNET-TimeSeries24 CESNET-AGG23 Abilene GÉANT SDN Telecom Italia Network Operator KPIs
Published in 2025 2023 2005 2005 2021 2015 2023
Collection period 9.10.2023 - 14.7.2024 25.2.2023 - 3.5.2023 2004 2005 2013–2014
Collection duration 40 weeks 10 weeks 6 months 16 weeks 4 days 2 months Multiple weeks
Aggregation window 1 day, 1 hour, 10 min 1 min 5 min, 10 min, 1 hour, 1 day 15 min, 1 hour, 1 day 1 min, 10 min, 1 hour, 1 day 10 min, 1 hour, 1 day 5 min, 10 min, 1 hour, 1 day
Sources CESNET3: Institutions, Institution subnets, IP addresses CESNET2 Abilene network GÉANT network Simulated SDN environment Milan city cells (SMS, call, internet) Network operator
Subsets Matrix, Node2Node, Node Matrix, Node2Node, Node Matrix, Node2Node, Node Downstream, Internet, Sessions, VPN
Cite https://doi.org/10.1038/s41597-025-04603-x https://doi.org/10.23919/CNSM59352.2023.10327823 https://doi.org/10.1145/885651.781053 https://dl.acm.org/doi/10.1145/1111322.1111341 https://doi.org/10.1109/ICC42927.2021.9500331 https://doi.org/10.1038/sdata.2015.55 https://doi.org/10.5281/zenodo.8147768
Source URL https://zenodo.org/records/13382427 https://zenodo.org/records/8053021 https://www.cs.utexas.edu/~yzhang/research/AbileneTM https://totem.info.ucl.ac.be/dataset.html https://github.com/duchuyle108/SDN-TMprediction https://dataverse.harvard.edu https://doi.org/10.5281/zenodo.8147768

Installation

Install the package from pip with:

pip install cesnet-tszoo

or for editable install with:

pip install -e git+https://github.com/CESNET/cesnet-tszoo#egg=cesnet-tszoo

Citation

If you use CESNET TS-Zoo, please cite our paper:

@misc{kures2025,
    title={CESNET TS-Zoo: A Library for Reproducible Analysis of Network Traffic Time Series}, 
    author={Milan Kureš and Josef Koumar and Karel Hynek},
    booktitle={2025 21th International Conference on Network and Service Management (CNSM)}, 
    year={2025}
}

Examples

For detailed examples refer to Tutorial notebooks

Initialize dataset to create train, validation, and test dataframes

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import TimeBasedConfig

dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.TIME_BASED)
config = TimeBasedConfig(
    ts_ids=50, # number of randomly selected time series from dataset
    train_time_period=range(0, 100), 
    val_time_period=range(100, 150), 
    test_time_period=range(150, 250), 
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Time-based datasets are configured with TimeBasedConfig.

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import DisjointTimeBasedConfig

dataset = CESNET_TimeSeries24.get_dataset("/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.DISJOINT_TIME_BASED)
config = DisjointTimeBasedConfig(
    train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
    val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
    test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
    train_time_period=range(0, 100), 
    val_time_period=range(100, 150), 
    test_time_period=range(150, 250), 
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Disjoint-time-based datasets are configured with DisjointTimeBasedConfig.

from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.utils.enums import SourceType, AgreggationType, DatasetType
from cesnet_tszoo.configs import SeriesBasedConfig

dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTIONS, aggregation=AgreggationType.AGG_1_DAY, dataset_type=DatasetType.SERIES_BASED)
config = SeriesBasedConfig(
    time_period=range(0, 250), 
    train_ts=50, # number of randomly selected time series from dataset that are not in val_ts and test_ts
    val_ts=20, # number of randomly selected time series from dataset that are not in train_ts and test_ts
    test_ts=10, # number of randomly selected time series from dataset that are not in train_ts and val_ts
    features_to_take=["n_flows", "n_packets"])
dataset.set_dataset_config_and_initialize(config)

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Series-based datasets are configured with SeriesBasedConfig.

from cesnet_tszoo.benchmarks import load_benchmark

benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset()

train_dataframe = dataset.get_train_df()
val_dataframe = dataset.get_val_df()
test_dataframe = dataset.get_test_df()

Loaded dataset can be one of the above.

About

CESNET Ts-Zoo is a toolkit for working with large time series network traffic datasets.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors