DDoS attack detection using Machine Learning
In this article, We are going to analyse apache logs generated through the WordPress website and apply machine learning to detect which of these IP are performing DDOS attack to the server so we can block them
Deploying test WordPress website for obtaining logs
In this project, I use AWS EC2 to deploy WordPress website so that it can be accessible from everywhere and for collecting genuine logs. for deploying WordPress on AWS EC2, I used terraform and docker. You can find all terraform and docker-compose file inside WordpressDeployingFiles.
Just add your AWS Credentials in AWS-CLI and execute terraform code.
terraform apply -auto-approve
This will Deploy WordPress Website
Perform DDOS Test on Website to make it down
I used scripts in this Github Repo to perform an attack on the website to make it down.
Cloudwatch DashBoard
Down Website
Fed logs into ELK stack to convert logs into CSV format for further analysis
Logstash Configuration file for Apache Logs
Commands to Run ELK Stack
# Creating Network
docker network create elk
# Run elasticsearch
docker run -d \
--name elasticsearch \
--net elk \
-p 9200:9200 \
-e "discovery.type=single-node" elasticsearch:tag
# Run logstash
docker run -it --rm \
--name=logstash \
-v ~/wordpress_data/logstash_config:/conf \
--net elk \
-p 5000:5000 \
-e LS_JAVA_OPTS="-Xms512m -Xmx512m" \
-e "http.host=0.0.0.0" -e "transport.host=127.0.0.1"\
logstash:7.7.1 \
-f /conf/logstash.conf
# Run Kibana
docker run -d \
--name kibana \
--net elk \
-p 5601:5601 kibana
Kibana screen to generate CSV file for Data Analysis
Apache logs Clustering and Pattern Mining
Importing Dataset and displaying info about dataset
Data Preprocessing
- Since timestamp.1 and _id and doesn’t contribute so removing them will increase the accuracy of cluster
df.drop([“@timestamp.1”,”_id”,],axis=1,inplace=True)
- In some client IP, we have 127.0.0.1 which will affect the accuracy
df = df[df.clientip != “127.0.0.1”]
- Preprocessing Geo IP (Country Code) by only getting the top countries
with max frequency
Creating Dummy columns and Scaling Data
I used Pandas get dummy for obtaining dummy columns and sklearn Min-Max Scaling
import pandas as pd
from sklearn import preprocessingx = edf.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_norm = pd.DataFrame(x_scaled,columns=edf.columns)
Creating the clustering model using sklearn
Result: Testing the predictions
According to prediction, One cluster contains only my public IP using which I perform DDoS on website
Implementation
In the end, we can use this model with Jenkins to perform regular testing and block the IP address which lies under cluster 0 and prevent the website from DDOS attack and prevent owner from large loss due to website downtime.