Killing Kafka: The Pitfalls of Over-architecting

Published in

Spaceship

6 min readOct 15, 2019

The road to hell is paved with good intentions

A large part of a developer’s career is spent enhancing or fixing an application already in production. Every once in awhile, maybe more often if you work for a start-up, you get the chance to start completely from scratch. The excitement can be overwhelming. A chance to fix everything you hate about the current application!

It is very easy to go over the top and start to think about all the new technologies you can use, the ones you’ve been experimenting with at home, the ones you just read about in a new article, or the ones you used in your previous job that worked really well. It is at this point when you might want to just stop for a moment, take a deep breath and remember this is something (hopefully) that will be around for a while and will have to be maintained by you and the rest of your team.

This is sometimes how I feel when I hear someone mention Kafka.

So, now you’ve got to be thinking to yourself, “What’s with the title? Why do you hate Kafka so much?” I don’t hate Kafka. Okay I do, but only because my experience with it has been sub-par and we have been using it to solve problems we didn’t have in the first place.

Whether you already follow KISS* and/or YAGNI* or you are trying to determine if it’s a good idea, hopefully you can learn from our mistakes and discover why you should consider minimalism when you begin your next project.

Background

Here at Spaceship, the first iteration of our backend for the Voyager app heavily relied on Kafka. The intentions were honourable: to create an application that will allow for auditability, stability and long-term load as our customer base grows.

Kafka ended up taking a fairly simple process and complicating it. The real guts of Voyager is the investment system, the majority of which doesn’t have direct interaction with members. It just wasn’t necessary to create a system for extreme heavy load simply to sign up to Voyager. The team was small, the timeline was ambitious, and the extra overhead was unnecessary for the first release of a product. In most cases, by the time you need to handle an extremely large load of requests, you’ve already rewritten things anyway. It is a good problem to have; it means you are becoming successful.

Other issues which were probably not considered at the time were the cost of infrastructure, and even more importantly, the cost of maintenance and support.

The Issues

It takes time to get your head around new technologies

Every technology has its own way of doing things. There is a “right” way. If you don’t have members on your team who already have experience with the new technology, it is very likely you won’t discover this until part or all of the way through the project. By then, you’ll already be dreaming about re-writing it properly.

Even if you do have members of the team who have spent significant time with the technology, it can take time getting the other members up to scratch. Sometimes the switch is necessary, but get the whole team together, discuss why you want to switch, and make sure it is for all the right reasons. Also, make sure you aren’t trying to add too many new technologies, too quickly.

Simple apps are quicker and easier when onboarding new engineers

Using a straightforward technology stack that is simple and commonly used in the industry means you can have new team members join the team and be up to speed much quicker than with a more complicated setup. Often it takes long enough to learn the business domain of the company. Adding extra complication through the tech stack means the overall output of the team will be reduced for longer than necessary. In a start-up, where there can be high turnover and you are trying to grow quickly, this can really add up.

Easier to support

Debugging issues when supporting a production application can often be difficult in the best of times. When the application is complicated this can lead to increased support issues.

One common issue is when only a single person in the team knows how the application works and ends up supporting it all the time. This knowledge silo can end up hurting the team if the engineer is on leave, ill or leaves the company, and no one else knows what to do. It can also become boring to the person supporting the application, causing the business to lose a quality member of the team because they wish to work on new things.

Another problem is that an issue can take much longer to debug, which can make for unhappy customers. Especially when you are new, the last thing you need is customers being upset, leaving bad reviews, or having bad things to say about you to their friends.

Easier to add on and modify

As with initial development, a simpler application allows for fixes, updates and new features to be added quicker. Sometimes a legacy application that is too complicated has any updates put in the “too hard” basket. This is simply because no one knows enough and the worry becomes if you touch it, you might never get it started again. In other words, don’t push your luck and upset the beast or it might come back and bite you.

How We Solved It

The quick answer is we moved to using gPRC microservices with protobuf schemas. Kafka was cut completely from the stack. Explaining how we did it is a story for another day, but it took a lot of planning, and all hands on deck for three to four weeks. Luckily, we were given the time to take care of what we considered massive technical debt.

The benefits have been considerable. On-boarding time has dropped, people are up and running on the technology quickly, and able to learn the business domain by tagging along with the support engineer for their first week or two. Support can be rotated through the whole team as anyone with a bit of knowledge around coding and the ability to search logs can at least track down where the issue lives. Support SLA times have dropped significantly and the rotating of support limits knowledge silos and boredom as you know you’ll be back on the fun stuff after you do your time. Oh, and did I mention we reduced our server count by about 50%?

Every company has their own issues to solve, and sometimes complicated technology is required. Just make sure that technology is solving problems and not creating them. Less code, less bugs.

Interested in learning more about our tech stack? Have a look at these links:
https://golang.org/
https://developers.google.com/protocol-buffers/
https://grpc.io/
https://www.postgresql.org/

*KISS — Keep it Simple Stupid
*YAGNI — You Aren’t Gonna Need It

By the way, Spaceship is currently hiring, have a look at our job board to see what is currently available.