Apache Log Parser Using Python

devops everyday challenge
devops-challenge
Published in
3 min readApr 30, 2017

The aim of this tutorial is to create Apache log parser which is really helpful in determine offending IP addresses during the DDoS attack on your website.This is what we are going to do

  • Read Apache log file(access.log)
  • Count quantity of requests to your website from each IP address

If you look at the content of access.log this is how it looks

192.168.0.1 — — [23/Apr/2017:05:54:36 -0400] “GET / HTTP/1.1” 403 3985 “-” “Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36”

Let’s break down these fields

192.168.0.1 --> IP address[23/Apr/2017:05:54:36 -0400] --> Date,time and timezoneGET / HTTP/1.1 --> HTTP get request to read the page403 --> Server response code3985 --> Number of byte transferredMozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36” --> Finally there are data of user's hardware,OS and browser

But the piece we are interested in is IP address

So as mentioned above our first step is

  • Read the Apache log file
def apache_log_reader(logfile):# We are saying opened file to the f variable, where f is a reference to the file object 
with
open(logfile) as f:
log = f.read()
print(log)


# Create entry point of our code
if __name__ == '__main__':
apache_log_reader("access_log")

Now let’s go to the second part

  • Count IP address
import re
from collections import Counter

def apache_log_reader(logfile):
myregex = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'

with
open(logfile) as f:
log = f.read()
my_iplist = re.findall(myregex,log)
ipcount = Counter(my_iplist)
for
k, v in ipcount.items():
print("IP Address " + "=> " + str(k) + " " + "Count " + "=> " + str(v))

# Create entry point of our code
if __name__ == '__main__':
apache_log_reader("access_log")

Let’s break this code

  • As we don’t need the entire entry we need to do some pattern search and that we can do with the help of regular expression
  • We imported the re module and then write the regular expression matching pattern, where
   \d: Any numeric digit[0–9]

For more info please refer to

r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
  • r stand for raw string
  • Then we are using collection module

Output

IP Address => 192.168.0.2 Count => 1
IP Address => 192.168.0.3 Count => 1
IP Address => 192.168.0.4 Count => 3
IP Address => 192.168.0.5 Count => 1
IP Address => 192.168.0.6 Count => 1

Full source code is available here https://github.com/lakhera2016/python/blob/master/apache-log-parser

This is not the only way to write this code,there are much better way to write the same piece of code, so stay tuned ;-)

Now the better way to write the apache log parser

#!/usr/bin/env python

"""
USAGE:

logparsing_apache.py apache_log_file

This script takes apache log file as an argument and then generates a report, with hostname,
bytes transferred and status

"""

import
sys

def apache_output(line):
split_line = line.split()
return {'remote_host':
split_line[0],
'apache_status': split_line[8],
'data_transfer': split_line[9],
}


def final_report(
logfile):
for
line in logfile:
line_dict = apache_output(line)
print(line_dict)


if
__name__ == "__main__":
if not
len(sys.argv) > 1:
print (__doc__)
sys.exit(1)
infile_name = sys.argv[1]
try:
infile = open(infile_name, 'r')
except
IOError:
print ("You must specify a valid file to parse")
print (__doc__)
sys.exit(1)
log_report = final_report(infile)
print (log_report)
infile.close()

To check the source code https://github.com/lakhera2016/python/blob/master/logparsing_apache.py devo

In case if you are facing any issue, this is the link to Python Slack channel https://devops-myworld.slack.com

Please send me your details
* First name
* Last name
* Email address
to devops.everyday.challenge@gmail.com, so that I will add you to this slack channel

HAPPY CODING !!!

--

--