The hidden value of your server logs

Nelson Gomes
Pipedrive R&D Blog
Published in
6 min readApr 12, 2022
Log data provides you with more valuable information than you can think.

Are you looking into your server log data? If not, you better start! While most people don’t pay much importance to logs, they can provide you with plenty of valuable information that allows you to distinguish between good and bad actors, so you can react when your server is under attack.

Last December, I was one of the lucky people who attended OWASP Global AppSec US 2021. This event helped me realize just how valuable log data can be. Take, for example, the “404 — Not Found” error:

  • It could stem from incorrect links on a page, in which case you can locate the error source by inspecting the page’s “referer” header. Another common cause for such errors is an outdated DNS entry with your server IP someone forgot to erase. Either way, it’s a common error that isn’t critical, though it should be fixed.
  • 404s can also be used to detect bad actors, such as probes trying to identify server vulnerabilities (e.g., malicious scripts). If you start seeing a growing number of 404s from a specific IP address probing entries like “/PHPMyAdmin/” or “/admin/,” you can easily block them on your TCP layer 4 firewall (block TCP communications for some time) or block them on the request level (layer 7) in your web app.

With this simple log information, you are now ready to tackle some bad actors. Awesome!

But you can extract much more from your server. Bad actors can probe valid URLs trying to trick your server, add query parameters, request headers, body parameters, etc. Assuming that your server is hardened properly and throws “400 Bad Request” whenever an invalid attempt is made, it can also provide data to help you keep bad actors at bay.

Let’s list some useful codes:

  • “400 Bad Request” (attempts to do something you’re not supposed to)
  • “401 Unauthorized” (failed logins, inaccessible pages)
  • “403 Forbidden” (you’re doing something you’re not allowed to)
  • “404 Not Found” (could be phishing for an open door)
  • (…)
  • “500 Internal Server Error” (some nasty bug crashed the request. You should look into this)
  • “502 Bad Gateway” (some issue with your server)

By reviewing specific status codes, you can obtain useful data. You can also use other sources. Let me list a few:

  • TCP fingerprinting — Similar to nmap during the TCP connection, this feature allows you to determine the operating system by investigating the packets received during a connection handshake (SYN/SYN-ACK/ACK). Note that this information is lost during TCP routing when using clouds because you need to access the original TCP connection to fetch it.
  • SSL fingerprinting — This feature allows you to generate a signature that identifies SSL uniqueness, which can identify numerous botnets since different SSL client implementations have different signatures, ciphers, extensions and elliptic curves. Here’s a sample:
    769,47–53–5–10–49161–49162–49171–49172–50–56–19–4,0–10–11,23–24–25,0
  • HTTP headers order — There is a known header order for each browser. If you can correlate browsers to their header order, you can identify any bad actor trying to (poorly) impersonate a browser. Many notable companies, like Akamai, use HTTP headers order checks to block abnormal requests.
  • User-agent header — This feature can provide details about browser family, version, operating system and other correlative data.
  • IP address — Can be used to determine the country, region, reverse DNS of the IP address, ASNs and network lookup, although these checks should be done offline because they take some time. You can then infer the timezone to compare it with the client-side (browser) timezone and detect proxies and VPNs.
  • Has the user passed 2FA/Captcha? Is he logged in? — Answers to these questions will allow you to identify bad requests based on outliers, so it’s advised that you always have their respective data available.
  • Client side fingerprinting — This allows you to distinguish fake browsers from real ones based on browser behavior.

Now, let’s get to the juicy part: an illustration of the above using a plugin for Fastify:

The code below creates a Fastify plugin, which we can use to gather information about requests. You can then send this information to a logging server (Logstash server, for example) to data-mine on top of it, in real time or offline.

First, you just need to to register your plugin:

import requestInfoPlugin from "./plugins/request-info-plugin";(...)// serviceName: service that generate info
// publicPrefix is the public path used to call this service
// when behind a load balancer
fastifyInstance.register(requestInfoPlugin,
{ serviceName, publicPrefix });

Plugin source code:

This code will create a variable named requestInfo in Fastify context, which contains all data collected from this request:

  • serviceName: service that generated the info
  • ip: source ip request (probably the load balancer ip)
  • realIp: the real IP address of remote customer, obtained from the load balancer
  • reqId: unique request ID from Fastify
  • method: GET, POST, (…)
  • path: request path (prefixed with publicPrefix)
  • host: hostname from request headers
  • ua: user agent string
  • uah: user agent hash
  • uai: user agent info from ‘ua-parser-js’ component
  • ssl: SSL signature string from https://github.com/phuslu/nginx-ssl-fingerprint which injects a header into request
  • sslh: SSL signature hash
  • referer: referer string
  • ac: accept header string
  • ach: accept header hash string
  • ho: string; // headers order string with a letter representing each header found;

Ontop of this data, you should add response status code, reverse IP info, ASN info, logged-in user, Captcha passed user, (…) and whatever information that allows to determine which requests good and which ones are bad (be creative!).

Once you start logging data and analyzing it, you should find some decent patterns, and hopefully, reach to a point where you can react to threaths in real-time, which is the ultimate goal!

Use this data to identify patterns and draw conclusions. Ask yourself:

  • Which browser families/versions are hitting my server? Are there multiple SSL fingerprints per browser? Do all have the same header order?
  • How many requests do I usually have per country? Any abnormal spikes in the above variables?

After answering these questions, create allowlist and blocklist rule sets for your servers.

Having this data available is the first step for you to building your own WAF (web application firewall). That is, blocking abnormal requests dynamically, if possible (and only once you have your allowlist tuned).

On top of the above guidelines, you should apply rate limits:

  • Per IP address
  • Per authenticated user
  • Limit login attempts to avoid brute force attacks
  • Use rate limits triggering logs to generate more useful information, about what bad players who are causing it and block incoming traffic if needed based of the blocked requests info

Next, you can proactively protect your service from external attacks and make it more secure. Creating WAF (web application firewalls) is important since — and due to SSL encryption — firewalls can’t protect servers from web request attacks anymore. To create them, you first need to collect the necessary data.

I hope this helps you understand the importance of data and inspires you to try new things. If you want to dig into this topic further, I’ve listed a few projects and sources from open source projects below. Thank you!

References:

SSL fingerprinting:
https://github.com/salesforce/ja3
https://github.com/fooinha/nginx-ssl-ja3 (Nginx SSL fingerprinting)

About http headers order:
https://sansec.io/research/http-header-order-is-important

Protecting Nginx server:
https://www.nginx.com/blog/nginx-app-protect-denial-of-service-blocks-application-level-dos-attacks/

Client Side fingerprinting:
https://abrahamjuliot.github.io/creepjs/
https://github.com/abrahamjuliot/creepjs (source code)

Interested in working in Pipedrive?

We’re currently hiring for several different positions in several different countries/cities.

Take a look and see if something suits you

Positions include:

  • Junior Developer
  • Software Engineer in DevOps Tooling
  • Backend, Full-Stack, iOS Engineers
  • Automation Engineer
  • And several more

--

--

Nelson Gomes
Pipedrive R&D Blog

Works in Pipedrive as senior SRE, has a Degree in Informatics from University of Lisbon and a Post-grad in project management.