Brex Tech Blog
Published in

Brex Tech Blog

Announcing Substation

Introduction

Why We Built Substation

The overarching goal of Substation is to empower security and observability teams to gain control over their event logs with minimal operational overhead. Most security platforms (i.e., SIEM) provide few options for teams that want to manage data quality. At best, they may give users the ability to do simple data transformations after events are ingested into the platform. If teams want high-quality data, they typically need to invest engineering and operational resources into deploying and maintaining complex distributed systems.

Our team has used Substation in production for more than 15 months, and we feel that now is the right time to share it with the open source security community. Since every team has a different definition of “at scale,” below are some numbers from our internal deployment to help explain how we use it:

  • Process more than 5 terabytes (TB) of data each day, with a variable cost of $30–40 per TB
  • Sustain more than 100k events per second (EPS), with peaks of more than 200k EPS
  • Deploy new data pipelines with 100s of unique resources in minutes
  • Manage 30 data pipelines, each with unique designs and configurations
  • Spend less than 1 hour per week on operations and maintenance

Differentiating Benefits

  • Fully serverless — you’ll never manage a server or think about sizing a cluster
  • Designed for scale — the system scales from 10s to more than 100,000s of events per second, and does it automatically with no intervention required by an engineer
  • Infrastructure and configs as code — we use Terraform, Jsonnet, and AWS AppConfig, which means you can deploy unique, reusable data pipelines in minutes
  • Cost efficient — we went the extra mile to make things affordable, including creating a compliant, minimal version of the Kinesis Producer and Client Libraries in Go
  • Extensible — we expose the Substation core as Go packages, so you can build your own custom data processing systems

Use Cases

As a data pipeline system, Substation has these features:

  • Real-time event filtering and processing
  • Cross-dataset event correlation and enrichment
  • Concurrent event routing to downstream systems
  • Runs on containers, built for extensibility
    - Support for new event filters and processors
    - Support for new ingest sources and load destinations
    - Supports creation of custom applications (e.g., multi-cloud)

As a package, Substation can:

  • Evaluate and filter structured and unstructured data
  • Modify data from, to, and in place as JSON

Most importantly, the project makes no assumptions about the design of a data pipeline. At Brex, we’re fans of two-phase design patterns (similar to what is described in Google’s SRE workbook) and use it for nearly all of our pipelines:

A diagram showing a two-phase data processing design pattern where source data is ingested, stored as streaming data, transformed, stored as streaming data, and loaded into downstream systems
Two-phase data processing design pattern used by the Security team at Brex

This design pattern supports storage of raw and processed data, and concurrently loads processed data to multiple destinations.

The project supports other use cases, too. Every data pipeline shown below is an example of what can be built today:

A diagram of example use cases that can be deployed using the Substation streaming data pipeline system
Example use cases supported by the toolkit

There are hundreds of unique design permutations, all of which are fully controlled by users through Terraform and Jsonnet configurations. Today the project offers these data ingest and load options:

  • AWS API Gateway (ingest)
  • AWS DynamoDB (load)
  • AWS Kinesis Data Streams (ingest and load)
  • AWS Kinesis Firehose (load)
  • AWS S3 (ingest and load)
  • AWS S3 via SNS (ingest)
  • AWS SNS (ingest)
  • AWS SQS (ingest and load)
  • HTTP (load)
  • Print to standard output (load)
  • Sumo Logic (load)

Achieving Scale, Engineers Not Invited

A graph showing consistent increases (peaks) and decreases (valleys) of data volume

A normal week at Brex creates peaks and valleys of activity that our resources scale with, but sometimes unplanned things happen and the peaks get really, really tall. One day in mid-June I checked our event processing metrics dashboard and saw this:

A graph showing an anomalously large, sudden increase (peak) of data volume followed by a consistent volume increase of 10x

It’s difficult to see, but our event volume increased 10x over a period of a few hours and increased 4x in under one hour. The scale of this increase would wreak havoc on traditional data pipeline systems, likely causing an incident due to memory, CPU, and event loss alarms all triggering at the same time, but for us it was a nonevent and instead we were left wondering what fun we had missed out on.

Future Work

  • Data enrichment functions for both the data pipeline system and public package
  • Support for multiple cloud service providers, including cross-cloud functionality
  • Configuration verification and automated rollback in AppConfig
  • Automated recovery for failures during asynchronous data ingest (e.g., S3)
  • Scalable, idempotent producers for sinking data

We’re always on the lookout for new data sources, sinks, and processing functions, so if there’s something you would like to see added (or to add yourself), then check out the project on GitHub and provide feedback in our Discussions forum.

Open source projects this large aren’t created in isolation, so thank you to everyone at Brex* who has contributed to the project and made it possible, and thank you to all the future collaborators we haven’t met yet!

* Special thanks to the Brex project team and contributors:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Josh Liburdi

Josh Liburdi

313 Followers

dale cooper is my spirit animal. opinions are mine.