DDoS attack detection using Machine Learning

Gurpreet Singh
4 min readJul 6, 2020

--

In this article, We are going to analyse apache logs generated through the WordPress website and apply machine learning to detect which of these IP are performing DDOS attack to the server so we can block them

Deploying test WordPress website for obtaining logs

In this project, I use AWS EC2 to deploy WordPress website so that it can be accessible from everywhere and for collecting genuine logs. for deploying WordPress on AWS EC2, I used terraform and docker. You can find all terraform and docker-compose file inside WordpressDeployingFiles.

Just add your AWS Credentials in AWS-CLI and execute terraform code.

terraform apply -auto-approve

This will Deploy WordPress Website

Test Website

Perform DDOS Test on Website to make it down

I used scripts in this Github Repo to perform an attack on the website to make it down.

Cloudwatch DashBoard

Down Website

Test Website Down

Fed logs into ELK stack to convert logs into CSV format for further analysis

Logstash Configuration file for Apache Logs

Commands to Run ELK Stack

# Creating Network
docker network create elk

# Run elasticsearch
docker run -d \
--name elasticsearch \
--net elk \
-p 9200:9200 \
-e "discovery.type=single-node" elasticsearch:tag

# Run logstash

docker run -it --rm \
--name=logstash \
-v ~/wordpress_data/logstash_config:/conf \
--net elk \
-p 5000:5000 \
-e LS_JAVA_OPTS="-Xms512m -Xmx512m" \
-e "http.host=0.0.0.0" -e "transport.host=127.0.0.1"\
logstash:7.7.1 \
-f /conf/logstash.conf

# Run Kibana

docker run -d \
--name kibana \
--net elk \
-p 5601:5601 kibana

Kibana screen to generate CSV file for Data Analysis

Apache logs Clustering and Pattern Mining

Importing Dataset and displaying info about dataset

Data Preprocessing

  • Since timestamp.1 and _id and doesn’t contribute so removing them will increase the accuracy of cluster
df.drop([“@timestamp.1”,”_id”,],axis=1,inplace=True)
  • In some client IP, we have 127.0.0.1 which will affect the accuracy
df = df[df.clientip != “127.0.0.1”]
  • Preprocessing Geo IP (Country Code) by only getting the top countries
    with max frequency

Creating Dummy columns and Scaling Data

I used Pandas get dummy for obtaining dummy columns and sklearn Min-Max Scaling

import pandas as pd
from sklearn import preprocessing
x = edf.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_norm = pd.DataFrame(x_scaled,columns=edf.columns)

Creating the clustering model using sklearn

Result: Testing the predictions

According to prediction, One cluster contains only my public IP using which I perform DDoS on website

Implementation

In the end, we can use this model with Jenkins to perform regular testing and block the IP address which lies under cluster 0 and prevent the website from DDOS attack and prevent owner from large loss due to website downtime.

If you have gotten this far into the blog give yourself a pat on the back because guess what? You’re awesome. The whole working repository is available on GitHub.

“We challenge each other, and leave as friends”. Hit me up on LinkedIn for any collaborations on the topic or edits of this article.

--

--