Data Science

DDoS Attack Detection

Building machine learning models for early detection of DDoS attacks using time series classification

Machine LearningClassificationPredictive ModelingTime SeriesDimensionality ReductionFeature EngineeringPythonScikit-learnMatplotlib

Problem Statement

A Russian entrepreneur was looking for a way to detect DDoS attacks using machine learning. A labeled dataset of time series of pings from hundreds of IPs was available and it was required to develop an approach to detect malicious IPs.

Challenges

  • Unlike typical supervised classification where it is required to assign a label to each record, in this task it was required to assign labels to each time instant of each time series.
  • Addressing class imbalance since number of DDoS attack time instants were far fewer than non-DDoS time instants.

Solution

Summary

I designed a time series classification approach to achieve good detection accuracy. The approach involved converting the time series data into a featurized dataset of statistical measures and using supervised ML classifiers on the featurized data. This way, new predictions could be made each second on each IP, thus improving reliability of the attack detection system.

Approach

  • Considering only DDoS IPs and using patterns during attack instants to perform classification
  • Creating featurized dataset by computing time series features like min, max, median, quantiles, % change before, during and after attack, etc.
  • Splitting the dataset into training and detection while addressing class imbalance through random oversampling
  • Dimensionality reduction of the features using PCA to reduce overfitting
  • Training a classifier on lower dimensional data - Support Vector classifier worked best
  • Using the classifier to make predictions and then augmenting the predictions with a rule based approach to improve detection accuracy.
  • Evaluating performance using precision, recall and accuracy

Results

Using my approach, detection accuracy increased for longer time series data. Therefore, longer the malicious IP attacked, greater was the chance that it would be detected

Model Recall by Time Series Length

Model Recall by Time Series Length

Model Accuracy by Time Series Length

Model Accuracy by Time Series Length

Have a similar challenge?

Let's discuss how we can develop solutions for your specific use case.

More Case Studies

DDoS Attack Detection | Case Study