Fun with Linux log files
In this article i’m gonna demonstrate some approaches that I have learned in dealing with data. I run a VPN/Proxy service for my business and the security of my user’s data is super important to me. One thing that I recently thought about is the fact that a lot of requests coming through my server have an email address attached to them. I store a certain amount of information about each request on the server but I don’t want to have these email addresses in the log files.
I thought about different ways to do this. Here is an example of a line in one of my log files. *I have redacted some information:
2015 05 05 22:06,user,allow,https://zm.zillow.com/web-services/GetUser
Status?zws-id=redact&email=user.email@gmail.com&locale=en_US&deviceType=null&device=redacted&client=com.zillow.ZillowMap,GET,200,application/x
-protobuf,8,,zillow.com 1,real-estate 100,,,,ZillowMap/11.1.813 CFNetwork/902.2 Darwin/17.7.0,HTTP/2.0,,Darwin
As you can see this request contains the users email address. I am still trying to find a good technique to scrub and replace the email address with something else but for now I am simply removing all lines containing an email address from the logs.
$: grep -Ev ‘\b[A-Za-z0–9._%+-]+@|%40|%2540[A-Za-z0–9.-]+\
.[A-Za-z]{2,6}\b’
The above command seems to do a pretty good job of spitting out every line except for one containing an email. As you can see I have @|%40|%2540 in my pattern. This is because while some requests simply include the @domain.com, others uriencode the email. The above regex takes care of both of these cases. Now since I am using grep I have to dump the contents into a temp file and then back into my original file to simulate a sed replace.
grep -Ev ‘\b[A-Za-z0–9._%+-]+@|%40|%2540|@[A-Za-z0–9.-]+\
.[A-Za-z]{2,6}\b’ /var/log/server/archive/2015/05/05.log > /tmp/05/05.log && cp /tmp/05/05.log /var/log/guardrails/archive/2015/05/05.log
Now if I wanted to run a batch replace I could do this:
for i in {01..31};do grep -Ev ‘\b[A-Za-z0–9._%+-]+@|%40|%2540[A-Za-z0–9.-]+\
.[A-Za-z]{2,6}\b’ /var/log/server/archive/2015/05/$i.log > /tmp/05/$i.log && cp /tmp/05/$i.log /var/log/guardrails/archive/2015/05/$i.log; done
Easy! I have searched for a way to do this in sed but have not had much luck.