A Distributed Denial of Service (DDoS) attack is a malicious attempt to disrupt normal traffic by overwhelming a target system, such as a server or network, with a flood of traffic. These attacks exploit vulnerabilities in network protocols, application services, or infrastructure, leading to service disruption and financial losses.
The DDoS Detection and Mitigation Dataset is designed to help in the development of machine learning models for detecting and mitigating DDOS attacks. It contains both benign network traffic and various types of DDoS attack traffic, collected using Mininet and an SDN Controller.
- Simulated attack and normal traffic data
- Packet-level features extracted for analysis
- Suitable for training machine learning models to classify network traffic
The dataset was collected using a Software-Defined Networking (SDN) environment:
- Mininet was used to create a virtual network topology.
- An SDN Controller managed traffic flow.
- Custom attack scripts simulated various DDoS attacks.
- Traffic monitoring tools (like Wireshark) captured network flow data.
- Feature extraction was done to prepare the dataset for machine learning.
- Load the dataset from a CSV file.
- Drop non-numeric and unnecessary columns.
- Handle missing and infinite values by replacing them with NaN and dropping them.
- Encoded non numeric columns and picked important features based on their correlation.
- Normalize numerical features using MinMaxScaler.
- Shuffle the dataset for better generalization.
- Split the dataset into training (80%) and testing (20%) sets.
To set up the environment and install dependencies, use the following:
- Python
- TensorFlow
- Keras
- NumPy
- Pandas
- Scikit-learn
- Matplotlib
- Seaborn
Run the following command to install the required libraries:
pip install tensorflow keras numpy pandas scikit-learn matplotlib seabornDeep learning models are implemented using TensorFlow/Keras, which includes:
- LSTM (Long Short-Term Memory)
- CNN (Convolutional Neural Network)
- ANN (Artificial Neural Network)
- Loss function: Binary Cross-Entropy
- Metrics: Accuracy, Precision, Recall
- Training for 5 epochs with a batch size of 128
- ANN was tested on the original test set, adversarial test set, and combined dataset
- Base model was trained to generate adversarial Samples using FGSM with ε=0.1. The adversarial dataset was saved as adversarial_dataset.csv
- Original and adversarial datasets were combined into combined_dataset.csv
- Accuracy
- Precision
- Recall
- F1-Score
- Confusion Matrix Visualization
- Confusion Matrices (Original Test Set, Adversarial Dataset, Combined Dataset)
- ROC curves for the original, adversarial, and combined datasets
- Bar charts comparing performance metrics across datasets


