Introducing CertStream

Any day that you see one of your projects make it to public release is a good day, and today is a damn good day. Today I introduce the world to CertStream — a free service and simple libraries for getting data from the Certificate Transparency Log (CTL) network in real time. This allows anyone to write extremely simple code (or even a bash script) to react to SSL certificates being issued, as they’re issued.

If you’re impatient like me, feel free to skip the whys and the hows of this article and just go to https://certstream.calidog.io/ to start hacking away!

Some Background

The past year or so has been an exciting rollercoaster for me — from running the security consultancy side of my company (email me if we can help!) to working hard to get our flagship product — PhishFinder to its general release target next week. Like most engineers, I find myself only able to stare at tools like Burp or do frontend development for a finite amount of time before I get the thousand yard stare of a man whose computer is, in fact, staring back at him. When these moments strike I try to redirect my energy to build something completely different in order to keep my technical muscles sharp, give back to the open source community, and remember why I love doing development in the first place.

In the early stages of designing the analysis pipeline for PhishFinder, I was eager to integrate a new data source that I had learned about — the Certificate Transparency Log. Unfortunately, I found some significant friction when integrating the data source, so I figured that spending some of my free time building a nice developer ecosystem on top of the amazing work that folks like Paul Hadfield and Ryan Hurst have done over at Google would be an excellent use of my time, and enable more folks to stand on the shoulders of giants.

Why?

It’s definitely not a secret that Certificate Transparency Logs are my personal favorite data source that our platform uses today (and one of my favorite sources to do OSINT work with). I think that it’s an amazing idea with some awesome engineering behind it, but I felt a few pain points that I thought I could address with a few simple tools.

Firstly, the CTL network is designed to be de-centralized, so that means there are a lot of different logs to monitor. Some are large, and some are small, and it’s somewhat hard to tell what logs do what or which to monitor for your needs (especially because there are 13 different CTLs from Google alone). It’s not a bad thing inherently, but from an engineering perspective it’s another hurdle to get over before you can accomplish what you’re trying to do.

Secondly, the logs emit the certificate data in a fairly opaque binary format, which I personally think is the biggest hurdle to its adoption. Not everyone is well-versed in parsing binary streams or even working SSL certificates in general. Simple interface abstractions are the reason someone can drive a car without knowing how its engine works, and a developer-friendly service is no different!

Third, the logs all use HTTP as their data exchange mechanism, which means lots of long polling from lots of clients, and every time a service relies on client long polling, a puppy dies. TL;DR— Save some puppies, use CertStream 🐶.

The goal for this project is pretty straightforward — provide an ecosystem of libraries and services that address these particular pain points and enable developers to build awesome things that react to certificate changes in real(ish) time with as little code as possible.

In other words…

How?

CertStream sets out to address all of those issues by aggregating, parsing, and streaming all the certificate updates for every CTL in real(ish) time over websockets. We provide nice libraries in Python, Go, Javascript, and Java to enable developers to build awesome things in just a few lines of their preferred language.

At its core, the most important bit is the monitor server, whose job it is to poll all the CTLs for changes, pull down the certificate data, parse it, and push it out over a websocket (if you want to test, just point any websocket client to wss://certstream.calidog.io ). We have future plans on divvying the data up into separate channels you can subscribe to, because sometimes you only want a leaf cert or the DNS names without the actual certificate data as well!

We’ve also built a CLI that comes bundled with the Python library that allows sys admins and non-developers alike be able to interact with CertStream as well! You can install it with a simple pip install certstream and the default run of the CLI looks like this:

The standard invocation — good for grepping

That’s fine and dandy, but what if you wanted to do something more complicated? How about having it emit raw json that can then be parsed using JQ to output a CSV? We’ve got you covered, using the --jsonflag!

CT Certificates -> CSV, yes please!

These examples are just the beginning of it — we’re going to be expanding this CLI in the coming weeks with a slew of fancy upgrades — keyword searching, slack/webhook integrations, a nice curses interface, and some larger-scale goals like figuring out if we can leverage something like SNS in a public way for free so this can do some serverless stuff (#webscale 🤖). Do you have a good idea for something you’d like to see in the CLI? File a bug and let us know!

Otherwise, you can get started by visiting certstream.calidog.io! Happy hacking!


Also, I would be remiss if I didn’t give a huge thanks to Jessica Weiller for her magic eyeballs and design chops, Philip Martin for the seemingly endless stream of great advice, Josh Hight for saving me from having to write Java, Paul Hadfield/Ryan Hurst & the entire CT team at Google for being some unsung heroes making the internet a safer place, and everyone who helped me beta test, provided feedback, or generally listened to me blather on about this project for the past few months!