© Illustration: Adriano García Suárez/Billie

Beckenbauer - how we built own IP Firewall in Billie

Nikolai Besschetnov
Billie Engineering Crew

--

As the business grows, the network traffic increases. New customers it is or merely curious beings, or dirty minds, contemplating their goals on how to abuse Billie’s infrastructure via this or that IPv4 address.

In this publication, a custom solution on how to be selective in regards to the IPv4 traffic will be described. Features HAProxy, GoLang, OSI Level 4.

Let’s roll, or use the table of contents to scroll 😏

Table of Contents

  • Why Beckenbauer?
  • The Idea Behind Our Own IP System
  • Contributors to the Latency
  • HAProxy and Golang. IP Address Blocking Solution
  • Summary

Why Beckenbauer?

To avoid the hassle of signing new contracts, reduce governance processes, and save on integration time, we decided to go for our own quick & robust solution — Beckenbauer.

Franz Beckenbauer was a german footballer. He made his name as a central defender. And us, the company established in Germany, felt it is going to be the name.

The Idea Behind a Firewall

To be in control of what IP addresses may pass, what not, according to our magic. And once the clue is there, take blocking or allowing actions!

Some kind of the mediator to let the user pass or not.

As Billie is hosted on Amazon Cloud, one could also enhance their product security by using Web Application Firewall (WAF).

Amazon WAF

AWS WAF is a web application firewall that helps protect your web applications or APIs against common web exploits that may affect availability, compromise security, or consume excessive resources.

https://aws.amazon.com/waf/

It comes in handy, e.g., in context of OWASP Top 10. This enterprise-level service gives control over how traffic reaches the applications by establishing the security rules. In addition to the rules, the following common baneful behaviors might be put under control:

  • DDoS
  • Scanners and Probes
  • IP Blacklists
  • Bots and Scrappers

One of the security rules is the “IP Match Conditions”. Sadly:

An IP set can hold up to 10,000 IP addresses or IP address ranges to check.

https://docs.aws.amazon.com/waf/latest/developerguide/waf-rule-statement-type-ipset-match.html

And we want to provide an enormous amount of individual IP addresses dynamically—based on our magic—and block these for good (with some exceptions, e.g., expiration time)! That leaves the WAF as a suboptimal solution.

The Middleware

HAProxy should not encapsulate the entire logic. Reasons why the middleware was introduced, which is another microservice.

The question, though, how would they (proxy & Beckenbauer) communicate with each other? The network in any architecture causes the most significant latency comparing to other components (e.g., memory). How fast will the customer get their content, when there’s another mediator between the browser and our services?

Contributors to the Latency

As the system will be between the user and our services, it is inevitably going to add a lag in milliseconds to the response — the Latency. How could engineers put this factor to the minimum?

There are different reasons why the delay between a request and response may vary. The least of the problem is de-/serialization of the data. What may cause a severe or minor delay? Let’s think as if we query a modern HTTPS API Endpoint.

Domain API Endpoint

Typically, when you surf the internet, you’ll transfer data from/to the resource over HTTPS protocol. Not only you spend time on TLS handshake, but before you could initiate it, you have to resolve the domain name. Both of these procedures take an enormous amount of time.

And even when you established a connection, all of the headers and other protocol shenanigans get to be exchanged. It’s three nails in the coffin, causing a severe latency.

Direct Request by IP

Behind each hostname is an IP address of the server. Poking the resource in the face will save us a lot of time. And HTTP itself is an add-on on top of TCP. It establishes a connection and ensures the data delivery.

Of course, you want to ensure the shortest network distance. Avoid multiple nodes to reduce the package travel time across.

HAProxy and GoLang. IP Address Blocking Solution

Technology stack

  • HAProxy
    High availability load balancer and proxy server for TCP and HTTP-based applications. To be used as a reverse proxy;
  • Lua
    Lightweight, high-level programming language, to invoke custom logic at HAProxy’s frontend;
  • GoLang
    For the lightweight concurrent TCP server;
  • Redis
    Fast in-memory key-value storage, to be located close to the GoLang app for the fast IP data retrieval.
  • AWS DynamoDB
    No-SQL Database for persistent IP addresses storage;
  • Magic Sauce
    Collection of services used to evaluate the ip address. E.g. AbuseIPDB.

Beckenbauer & HAProxy Integration

Every request lands first on HAProxy. There you can define so-called frontends (accept requests) and backends (fulfill requests) and execute Lua scripts.

HAProxy Invocation Scheme

There’s an example in the proxy’s blog. We basically copy-pasted it with a bit of alterations, e.g.

-- Get response body, if any
local content = socket:receive(1)
-- Check if this request should be allowed
if content == nil or content == "" then
core.Alert('Got no content from server')
socket:close()
return
end
if content == "1" then
txn:set_var('req.blocked', false)
elseif content == "0" then
txn:set_var('req.blocked', true)
else
core.Alert(string.format("Unknown content was gotten: %s\n", content))
end

As for the payload, there is just an IPv4 string. A response is the one single byte 1 or 0, indicating if the reverse-proxy should process the request further or not.

In case of Beckenbauer failure (e.g., because of the timeout, which is one second), the client will be passed through the edge.

GoLang Project

The language is very versatile, offers you plenty of packages. It’s ridiculously simple to create a basic concurrent TCP server there, using goroutines. E.g.

The project’s structure is a monolithic package:

  • /cmd/server/main.go
    The server itself;
  • /go.mod
    Go Modules handle external packages;
  • /Makefile
  • /pipelines.go
    Defines the Frontend (FE) and Backend (BE) Pipeline;
  • /file.go
    Rest of GoLang files. E.g., stages for the pipelines and services.

We use the pipelines to run through each stage. There are 4 of them:

  • Find an address in the Redis;
  • Validate the address (whether light or heavy checks);
  • Persist checks;
  • Notify about the results.

Frontend Pipeline consists of the first two stages (light checks); Backend Pipeline takes all 4 with the heavy checks flag.

Middleware’s high-level flow-chart

Passive and Active Working Mode

In the passive scenario, a decisive action is taken upon the Frontend pipeline has run. We close the TCP connection as soon as there’s a clue, whether a client may go, based on Redis records. It means that the first connection for an unknown or expired IP address will always be allowed. The average lag on the user request is about 17ms in this case.

The Backend pipeline, of course, takes much more time, about 250ms. When Beckenbauer needs to be extremely cautious, we don’t run FE, but just BE. On the first request, there already will be a decision.

And last but not least, once the client is blocked, we notify our Ops in Slack.

Summary

Sometimes you need to put bicycle components together. With GoLang it’s possible to prototype and create MVP, polish it to the working solution in a minimal amount of working hours. And it just works!

Credits

  • To our Infrastructure Team Lead Aleksei Vesnin for bringing the sanity thought to my ideas of using SQS & AWS Lambda;
  • To our CTO Artem Demchenkov for the inspiration and support.

Cheers, Nikolai.

--

--