A Python library for autoencoder-based anomaly detection based on self-supervised training with dynamic sample confidence updates.
This library is designed to:
- train a model that produces an anomaly score;
- estimate per-sample confidence from that score;
- analyze score distributions to periodically recalibrate confidence intervals corresponding to normal, abnormal and unknown samples;
- apply different losses depending on confidence regions, which can take confidence and intervals into account to reweight samples;
- track experiments, metrics, and artifacts with MLflow.
NB: examples are proposed in the examples folder. They correspond to the implementation of the RADON and GRAnD anomaly detection models.
[1] N. Najari, S. Berlemont, G. Lefebvre, S. Duffner, et C. Garcia, « Robust Variational Autoencoders and Normalizing Flows for Unsupervised Network Anomaly Detection », in Advanced Information Networking and Applications, vol. 450, L. Barolli, F. Hussain, et T. Enokido, Éd., in Lecture Notes in Networks and Systems, vol. 450. , Cham: Springer International Publishing, 2022, p. 281‑292. doi: 10.1007/978-3-030-99587-4_24.
[2] N. Najari, S. Berlemont, G. Lefebvre, S. Duffner, et C. Garcia, « RADON: Robust Autoencoder for Unsupervised Anomaly Detection », in 2021 14th International Conference on Security of Information and Networks (SIN), déc. 2021, p. 1‑8. doi: 10.1109/SIN54109.2021.9699174.
''' pip install -e . '''
or, to launch examples,
''' pip install -e .[dev] '''
The codebase is split into focused modules:
confidence_estimators: confidence estimation logic from model scores.distribution_analyzers: score distribution analysis and interval extraction.datamodules: Lightning data wrapping with confidence-aware datasets.datasets: dataset types and confidence I/O helpers.modules: self-supervision training module and callback.models: PyTorch model definitions (e.g., autoencoder).loggers: MLflow utility functions for artifacts and metrics logging.
All PlantUML source files are in:
docs/diagrams/*.plantuml
Main elements:
SupportsConfidenceEstimation(Protocol)BaseConfidenceEstimator(abstract)ConfidenceIntervalsConfigurationInterval(extendspandas.Interval)
Main elements:
SelfSupervisionDataModulewrapping a Lightning datamodule- Integration with
DatasetWithConfidence
Main elements:
DataFrameWithLabelsDatasetWithLabels/DatasetWithInputDim(Protocols)DatasetWithConfidence- Utility functions:
init_confidence_from_csvsave_confidence_to_csv
Main elements:
SupportsDistributionAnalysis(Protocol)- Concrete analyzer implementations (e.g., thresholding strategies)
Main elements:
torch.nn.ModuleAutoencoder
Main elements:
SupportsSelfSupervision(Protocol)SelfSupervisionModule(abstract Lightning module)SelfSupervisionCallback- Dependency injection of:
SupportsConfidenceEstimationSupportsDistributionAnalysis
- The model computes per-sample scores (
score/_prediction_score). - Distribution analysis derives confidence intervals.
- Confidence estimator maps scores to confidence values.
- Training dataset is refreshed with updated confidence.
- Loss computation uses confidence-aware behavior (normal/abnormal/uncertain).
- Confidence and intervals are recalibrated every
every_n_epochs. - Metrics and artifacts are logged.
ssad/loggers/mlflow_logger.py provides helper functions to log:
- confidence CSV snapshots (
confidence_epoch_*.csv) - confidence interval JSON files (
confidence_intervals_epoch_*.json) - distribution analysis figures (
confidence_analysis_epoch_*.svg) - system metrics (CPU / RAM / GPU)
- test metrics by threshold (
test_metrics_threshold=*.json)
- Python 3.10+
- PyTorch
- Lightning
- NumPy / pandas / scikit-learn / matplotlib
- MLflow
- psutil
- Prepare a dataset compatible with:
DatasetWithLabelsDatasetWithInputDim
- Build your base
LightningDataModule. - Wrap it with
SelfSupervisionDataModule. - Instantiate:
- a model (
nn.Module) - a confidence estimator (
SupportsConfidenceEstimation) - a distribution analyzer (
SupportsDistributionAnalysis) - a concrete
SelfSupervisionModule
- a model (
- Train/evaluate with Lightning
Trainer.