Fun with Linux log files

Mike Conrad
Sep 6, 2018 · 2 min read

In this article i’m gonna demonstrate some approaches that I have learned in dealing with data. I run a VPN/Proxy service for my business and the security of my user’s data is super important to me. One thing that I recently thought about is the fact that a lot of requests coming through my server have an email address attached to them. I store a certain amount of information about each request on the server but I don’t want to have these email addresses in the log files.

I thought about different ways to do this. Here is an example of a line in one of my log files. *I have redacted some information:

2015 05 05 22:06,user,allow,https://zm.zillow.com/web-services/GetUser
Status?zws-id=redact&email=user.email@gmail.com&locale=en_US&deviceType=null&device=redacted&client=com.zillow.ZillowMap,GET,200,application/x
-protobuf,8,,zillow.com 1,real-estate 100,,,,ZillowMap/11.1.813 CFNetwork/902.2 Darwin/17.7.0,HTTP/2.0,,Darwin

As you can see this request contains the users email address. I am still trying to find a good technique to scrub and replace the email address with something else but for now I am simply removing all lines containing an email address from the logs.

$: grep -Ev ‘\b[A-Za-z0–9._%+-]+@|%40|%2540[A-Za-z0–9.-]+\
.[A-Za-z]{2,6}\b’

The above command seems to do a pretty good job of spitting out every line except for one containing an email. As you can see I have @|%40|%2540 in my pattern. This is because while some requests simply include the @domain.com, others uriencode the email. The above regex takes care of both of these cases. Now since I am using grep I have to dump the contents into a temp file and then back into my original file to simulate a sed replace.

grep -Ev ‘\b[A-Za-z0–9._%+-]+@|%40|%2540|@[A-Za-z0–9.-]+\
.[A-Za-z]{2,6}\b’ /var/log/server/archive/2015/05/05.log > /tmp/05/05.log && cp /tmp/05/05.log /var/log/guardrails/archive/2015/05/05.log

Now if I wanted to run a batch replace I could do this:

for i in {01..31};do grep -Ev ‘\b[A-Za-z0–9._%+-]+@|%40|%2540[A-Za-z0–9.-]+\
.[A-Za-z]{2,6}\b’ /var/log/server/archive/2015/05/$i.log > /tmp/05/$i.log && cp /tmp/05/$i.log /var/log/guardrails/archive/2015/05/$i.log; done

Easy! I have searched for a way to do this in sed but have not had much luck.

Mike Conrad

Written by

Founder at Enxo. Software developer, Linux enthusiast. I develop Content filtering and accountability software for iPhones/iPads and computers.