Apache2 Log Parser

Michael Meade
6 min readSep 27, 2021

--

The origin story

I have a Apache2 web server set up on one of my many VPS. The purpose of this web server to kind of be a “web” honeypot. I wanted to see how many times someone visited or tired to scan the server. I created fake files and directories so that certain web paths would return 200 status code. I created this class so that I could easily parse the Apache2 logs and display the results. Instead of having to remember a bunch of commands to parse the logs or even running little scripts. I thought that this would be keep the results more consistent and allow me to save the results of the scan so I can take a look at a couple of months of data or even a whole years worth of logs.

The Code

require 'date'
require 'terminal-table'
require 'json'
class Template
def initialize(file)
@file = file
@read = File.readlines(@file)
end
def count_total(array)
h = {}
array.each do |ii|
if h.has_key?(ii)
h[ii] += 1
else
h[ii] = 1
end
end
return h
end
def get_time
time = []
@read.each do |i|
time << i.split(" ")[3].split(":")[1..3].join(":")
end
return count_total(time)
end
def get_date
date = []
@read.each do |i|
date << i.split("[")[1].split(":")[0]
end
return count_total(date)
end
def get_date2
# todo: split up the dates & times
date = []
@read.each do |i|
if not i.split("[").nil?
date << i.split("[")[1].split("]")[0]
end
end
return count_total(date)
end
def get_status
status = []
@read.each do |i|
if not i.split('"').nil?
status << i.split('"')[2].split(" ")[0]
end
end
return count_total(status)
end
def get_ip
ips = []
@read.each do |i|
if not i.split(" - - ").nil?
ips << i.split("- -")[0].strip
end
end
return count_total(ips)
end
def get_path
urls = []
@read.each do |i|
if not i.split('"')[1].split(" ")[1].nil?
urls << i.split('"')[1].split(" ")[1]
end
end
return count_total(urls)
end
def get_ua
ua = []
@read.each do |i|
if not i.split('"')[5].nil?
if not i.split('"')[5] == "-"
ua << i.split('"')[5]
end
end
end
return count_total(ua)
end
def get_method
meth = []
@read.each do |i|
if not i.split('"')[1].nil?
meth << i.split('"')[1].split(" ")[0]
end
end
return count_total(meth)
end
end
class SaveFile
def initialize(json = nil, file_name = "out.json")
@json = json
@file_name = file_name
end
def check_exists
return File.exist?(@file_name)
end
def write_json
if !check_exists
File.open(@file_name, 'a') { |f| f.write(JSON.generate(@json)) }
else
read = File.read(@file_name)
j = JSON.parse(read)
j = @json.merge!(j) { |k, m, n| m + n }
File.open(@file_name, 'w') {|f| f.write(j.to_json) }
end
end
def read_json
r = File.read(@file_name)
j = JSON.parse(r)
end
end

class Print
def initialize(json, title = nil, h1 = nil, h2 = nil, width: 40)
@json = json
@title = title
@h1 = h1
@h2 = h2
@width = width
end
def top_ten_pt
out = []
@json.sort_by{|k,v| -v}.first(10).each do |k, v|
out << [k, v]
end
table = Terminal::Table.new
if !@title.nil?
table.title = @title
else
table.title = "IP attempts"
end
if !@h1.nil? && !@h2.nil?
table.headings = [@h1, @h2]
else
table.headings = ['IP', 'attempts']
end
table.rows = out
table.style = {:width => @width, :border => :unicode_round, :alignment => :center }
puts table
end
def print_table
out = []
@json.sort_by{|k,v| -v}.each do |k, v|
out << [k, v]
end
table = Terminal::Table.new
if !@title.nil?
table.title = @title
else
table.title = "IP attempts"
end
if !@h1.nil? && !@h2.nil?
table.headings = [@h1, @h2]
else
table.headings = ['IP', 'attempts']
end
table.rows = out
table.style = {:width => @width, :border => :unicode_round, :alignment => :center }
puts table
end
end
json = Template.new("access.log").get_date
p json

top_ten_pt method

json = Template.new("access.log.1").get_ip
Print.new(json, width: 100).top_ten_pt

The code snippet above shows the top_ten_pt method in action. The purpose of the top_ten_pt method is to print out the top ten results instead of all the items. This may be useful for if the table is too large.

Shows a table of top ten IPs.

get_date method

json = Template.new("access.log").get_date
Print.new(json,"Date","The Date","#", width: 40,).top_ten_pt

The code snippet above shows the code needed to print out a table that contains the dates and amount of times that there was web traffic. This data might be useful for because it could allow the reader to find patterns in the data. It could even be used by websites owners to see what day that there site was the most popular.

Shows a table with time and number of attempts made to the web server.

The get_ip method

json = Template.new("access.log").get_ip
Print.new(json, "Ips Attemp", "IP", "Attempt #").print_table

The code snippet above will use the Template class to read the access.log file and scrapes the IPs that are in the log. The get_ips method will return a JSON hash with all the IPs as they keys, the values of the JSON is the number of times that the IPs were seen in the log.

get_path method

json = Template.new("access.log").get_path
Print.new(json, "paths", "web path", "Count", 140).print_table

The code snippet above shows what URLS were visited and the amount of times they were visited. When messing around with this. I found out that the code gave an error because the table was not big enough to fit all the data inside the table. So I added another instance variable into the class. The new instance variable is used to set the width of the table. This number can be changed easily now without having to edit the code each time.

Table showing the web paths and the amount of times they were visited.

Changing the headers and title using the Print class

json = Template.new("access.log").get_method
Print.new(json, "HTTP requests", "Method","Count").print_table

The code above shows another use for the code. This time instead of showing what IPS made the most requests the code will use the get_method method to create a JSON string of the number of times that GET or POST HTTP methods were used in the logs.

The second line in the code snippet shows that we took JSON and then used the Print class to display the results in a table. Not surprisingly, GET was the most commonly requested HTTP method.

Table showing the number of get Vs Post requests that were found in Apache2 logs

Installing gems

gem install terminal-table
gem install gruff

The SaveBar class

require 'gruff'
require 'json'
class SaveBar
def initialize(file_name, out, title: nil)
@title = title
@out = out
@g = Gruff::Bar.new(1000)
@file_name = file_name
@j = JSON.parse(File.read(@file_name)).sort_by{|k,v| -v}.first(10).to_h
end
def color
["#0000FF", "#f50f81", "#e36c9f", "#de92d6", "#92f568", "#FF0000", "#e2dc54", "#947cac", "#657f3b", "#3ba1fb", "#14521a"]
end
def create_bar
@g.title = @title
@g.colors = color
@j.each do |data|
@g.data(data[0], data[1])
end
@g.write(@out)
end
end

The code snippet above is the SaveBar class. This class uses gruff to create bar graphs of the data.

SaveBar.new("ips.json", "ips.png", title: "IPS").create_bar

The code snippet above shows the SaveBar in action. The code will read the ips.json file and create a bar graph with the data.

A bar graph showing the number of times an IP was seen.

auto_scrape.rb

require_relative 'lib'
require_relative 'gruff'
Dir['*'].each do |file_name|
if file_name.include?("access.log")
json = Template.new(file_name).get_ip
SaveFile.new(json, file_name:"ip_auto_test.json").write_json
end
end
j = SaveFile.new(file_name: "ip_auto_test.json").read_json
# prints out the Table showing the IPs
Print.new(j, width: 40).top_ten_pt
SaveBar.new("ip_auto_test.json", "ips.png", title: "IPS").create_bar

The script shown above will loop through all the files in the directory, if it finds an file with “access.log” in the name. The code will scrape the IPs from the file, save the IPs and the number of times they were seen. The next part of the script will read the newly created JSON file and print out a nice table in the terminal. Lastly the code will use the SaveBar class to create a bar graph with the data in the JSON file. The code will output a file named ips.png .

SaveBar class

require 'gruff'
require 'json'
class SaveBar
def initialize(file_name, out, title: nil)
@title = title
@out = out
@g = Gruff::Bar.new(1000)
@file_name = file_name
@j = JSON.parse(File.read(@file_name)).sort_by{|k,v| -v}.first(10).to_h
end
def color
["#0000FF", "#f50f81", "#e36c9f", "#de92d6", "#92f568", "#FF0000", "#e2dc54", "#947cac", "#657f3b", "#3ba1fb", "#14521a"]
end
def create_bar
@g.title = @title
@g.colors = color
@j.each do |data|
@g.data(data[0], data[1])
end
@g.write(@out)
end
end

The code snippet below shows the SaveBar class in action.

SaveBar.new("ip.json", "ips.png", title: "IPS").create_bar

The code above uses the SaveBar class and the create_bar method to read the JSON file, ip.json file.

The code can be found here: https://github.com/Michael-Meade/Apache2LogViewer

A more detailed blog post can be found at https://michael-meade.github.io/Projects/apache2-log-reader.html

--

--