0 to 400 Mio. Events per Day in 3 months 🚀

How a team at Axel Springer in Berlin built an insanely capable realtime-analytics pipeline in just under three months

Jonas Peeck
Axel Springer Tech

--

NMT (“National Media Tech”) is the tech-division behind many of Axel Springer’s German News Media brands. As a part of the Axel Springer Group, it’s also a member of the Global Axel Springer Developer Community.

“I actually don’t know how many events per day that is — but I can tell you that we’re processing around 2000–4000 events per second” Philip told me in his charming, slightly accented English that instantly tells you of his Nigerian roots.

When I tried to calculate how many events per day that would add up to, I landed at 5 Million events per day. Hanna the PO on the team laughed and pointed out that it actually would be “slightly” more than that. I had forgotten to multiply by 60 — their system actually processes over 400 million records every day.

When I tried to estimate how many events “Jetstream” (the very fitting name of the tool they built) processes every month, the calculator on my phone simply refused to display it. Instead, I was presented with the kind of scientific notation that instantly tells you that you’re looking at a large number with a lot of decimals.

But that isn’t even the wildest part about Jetstream. The system is designed to deliver realtime-insights: It only takes around 2 minutes (or an equivalent of 480.000 incoming events) for new data to be added to the day’s numbers.

Oh and they built the complete system in just three months. Maybe that is actually the wildest part about it.

Analyzing 13.65 million unique users a day in realtime

It was one of those moments where it was just GO time. Their previously used analytics platform was being replaced, but the new platform lacked business-critical realtime analytics.

The deadline was a hard one. Because of an expiring contract the old tooling would be gone in a few months and without a replacement for the realtime analytics, the editors in chief at BILD and Welt would have to fly blind on running news websites with millions of visitors everyday.

In the world of printed newspaper, the decision for what story is on the cover to grab the reader’s attention is made once a day. On modern news-websites that decision is constantly made throughout the day, and a good or bad placement can make a huge difference in ad revenues and conversions of readers to digital subscribers — the economic life blood of newspapers today.

Jetstream, the realtime analytics solution the Content Intelligence team built, breaking down the most viewed articles of the day by traffic source (web, app, search, social, and other)

Getting users used to new systems is always a challenge — especially when so much is at stake. So after evaluating different options, the team around Hanna & Philip (Team “Content Intelligence”) eventually landed at a custom made solution: They would build a custom realtime aggregator, built on top of an event stream from their new analytics provider.

Everything was going fairly smoothly, when suddenly…

3 months from POC to Production

…unforeseen delays in the integration with the event stream cost the team a full month.

The hard deadline for the shutdown of the old analytics system was now only three months away. The Content Intelligence team knew that their custom aggregator was the only way that the newsrooms would still get their much needed realtime insights into the behavior of millions of users every day.

So the team went into overdrive. From “not on their roadmap at all” to the most important project that they focussed on completely — the switch came very fast and the team had to drop their usual processes for a while.

Instead of their digital Jira board, and the usual scrum planning processes, it was now post-its on the wall and a mad dash to meet the deadline.

The team made it! And they got the details right. The night before the launch, they deactivated all frontend tests to rollout a last-minute overhaul of the UI design (which you can see in the cover image of this blogpost).

“Just Start” & Listen

When asked about what they learned from the project — both Hanna and Philip answered with pretty much the same two words: “Just start”.

Chart showing “events per second” that are processed by Jetstream

For Hanna (as the PO of the team) “just starting” had a positive impact on how they interacted with their users: By being forced to go live with a reduced feature-set (aka a classic MVP — Minimum Viable Product) they had to in turn get feedback from their users in the newsroom, to understand which features they needed next.

In pre-corona times that meant being physically present in the newsroom, proactively prompting their users to critique their latest product iteration. One mistake people can make — according to Hanna — is rolling out early, but then not actively pulling feedback out of their users. Many incredibly valuable insights can silently slip through the cracks if teams don’t make a conscious and persistent effort to get feedback out of their users.

The way that the team interacted with their users in this early phase also had completely unexpected benefits. The close collaboration enabled editors to see how their feedback was quickly translated into changes in the Jetstream system & user interface — building a good working relationship between the users in the newsroom and the Content Intelligence team. The strong relationship that was developed early on, serves them incredibly well now that the team is forced to work remotely due to COVID-19.

Good Monitoring saves the day

One thing that became increasingly clear when I talked to Hanna & Philip was how their “Start & Listen” approach not only helped them with their product development — it also enabled them to build the technology behind Jetstream.

“There’s a bit of mystery, and you don’t know exactly what’s happening”, Philip explains with a wide grin the challenge of building such a highly scalable event processor. The way he describes it reminded me of mechanics, tenderly fixing some machinery, expertly listening to its rattles and sounds as they work to identify and fix the problem.

Hanna makes a good point when she points out how monitoring is usually treated as more of a “hygiene” thing — almost like an afterthought to building systems. In Jetstream’s case however it was critical to getting the system into a reliable state.

The monitoring setup for anything from CPU usage, RAM levels, to AWS account limits, helped the team get the tool into shape. Just like the features they shipped to editors, their tech and the monitoring also evolved over time. It’s a testament to the team’s approach that they even managed to detect and fix Node.js memory leaks like that.

One of their alarms is triggered if Jetstream doesn’t receive a single event within a 5 second window. Jup — that’s the kind of alarming people will setup when 5 second of no data means 10.000–25.000 missed events.

Connect to Philip & Hanna:
Hanna on LinkedIn
Philip on LinkedIn

đź’Ľ Curious about working at Axel Springer Tech?
👉🏻 Check out our open positions

— — —

Hi, I’m Jonas and I’m currently building up a Global Axel Springer Developer Community. Subscribe to our Weekly Tech Newsletter to get a fresh story like this every Monday and to receive details on how you can join our global developer community ✌🏻

Want me to help tell your story? Reach out to me 👇🏻

🚀 Subscribe to the Weekly Tech Newsletter

✉️ Get in touch with us

--

--

Jonas Peeck
Axel Springer Tech

Founder of uncloud - the first cloud platform that configures itself