Apache Log Parser Using Python
The aim of this tutorial is to create Apache log parser which is really helpful in determine offending IP addresses during the DDoS attack on your website.This is what we are going to do
- Read Apache log file(access.log)
- Count quantity of requests to your website from each IP address
If you look at the content of access.log this is how it looks
192.168.0.1 — — [23/Apr/2017:05:54:36 -0400] “GET / HTTP/1.1” 403 3985 “-” “Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36”
Let’s break down these fields
192.168.0.1 --> IP address[23/Apr/2017:05:54:36 -0400] --> Date,time and timezoneGET / HTTP/1.1 --> HTTP get request to read the page403 --> Server response code3985 --> Number of byte transferredMozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36” --> Finally there are data of user's hardware,OS and browser
But the piece we are interested in is IP address
So as mentioned above our first step is
- Read the Apache log file
def apache_log_reader(logfile):# We are saying opened file to the f variable, where f is a reference to the file object
with open(logfile) as f:
log = f.read()
print(log)
# Create entry point of our code
if __name__ == '__main__':
apache_log_reader("access_log")
Now let’s go to the second part
- Count IP address
import re
from collections import Counter
def apache_log_reader(logfile):
myregex = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
with open(logfile) as f:
log = f.read()
my_iplist = re.findall(myregex,log)
ipcount = Counter(my_iplist)
for k, v in ipcount.items():
print("IP Address " + "=> " + str(k) + " " + "Count " + "=> " + str(v))
# Create entry point of our code
if __name__ == '__main__':
apache_log_reader("access_log")
Let’s break this code
- As we don’t need the entire entry we need to do some pattern search and that we can do with the help of regular expression
- We imported the re module and then write the regular expression matching pattern, where
\d: Any numeric digit[0–9]
For more info please refer to
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
- r stand for raw string
- Then we are using collection module
Output
IP Address => 192.168.0.2 Count => 1
IP Address => 192.168.0.3 Count => 1
IP Address => 192.168.0.4 Count => 3
IP Address => 192.168.0.5 Count => 1
IP Address => 192.168.0.6 Count => 1
Full source code is available here https://github.com/lakhera2016/python/blob/master/apache-log-parser
This is not the only way to write this code,there are much better way to write the same piece of code, so stay tuned ;-)
Now the better way to write the apache log parser
#!/usr/bin/env python
"""
USAGE:
logparsing_apache.py apache_log_file
This script takes apache log file as an argument and then generates a report, with hostname,
bytes transferred and status
"""
import sys
def apache_output(line):
split_line = line.split()
return {'remote_host': split_line[0],
'apache_status': split_line[8],
'data_transfer': split_line[9],
}
def final_report(logfile):
for line in logfile:
line_dict = apache_output(line)
print(line_dict)
if __name__ == "__main__":
if not len(sys.argv) > 1:
print (__doc__)
sys.exit(1)
infile_name = sys.argv[1]
try:
infile = open(infile_name, 'r')
except IOError:
print ("You must specify a valid file to parse")
print (__doc__)
sys.exit(1)
log_report = final_report(infile)
print (log_report)
infile.close()
To check the source code https://github.com/lakhera2016/python/blob/master/logparsing_apache.py devo
In case if you are facing any issue, this is the link to Python Slack channel https://devops-myworld.slack.com
Please send me your details
* First name
* Last name
* Email address
to devops.everyday.challenge@gmail.com, so that I will add you to this slack channel
HAPPY CODING !!!