Why does Bolt use Node.JS?
The most frequently asked candidate question during interviews is, “What is Bolt’s tech stack?”
You might be surprised, but we wrote all 1000+ microservices at Bolt with Node.JS. Yes, that’s right — all 600 developers at Bolt use Node.JS.
To answer why, let me dive into our engineering philosophy.
We view technology as a means for solving business problems. Business problems may not sound fancy to many engineers, but being operationally efficient resulted in Bolt’s success. So it’s good to be guided by business needs when approaching technical problems. Some problems might even be solved manually, without using any technology.
We also follow the “one way of doing things” rule, meaning that for a group of problems, we try to use the same tool as much as possible. So, for example, we use Node.JS with TypeScript entirely for the backend. There are, of course, a few exceptions, but they’re related to specific, primarily computation-heavy tasks.
Now, fasten your seatbelts, and without further ado, let’s understand why this approach is helping us be highly efficient.
Bolt uses a microservices model. At our scale, we have 1000+ unique services (at the time of writing), each running several instances in different AWS availability zones.
Services are primarily busy processing API requests, processing various events, and running cron jobs.
The business logic is usually quite simple: check some “if” statements, call other services, query the database, etc. You don’t need a heavy JVM with gigabytes of ram for such tasks.
Here comes the first advantage of Node.JS — it’s lightweight. One EC2 machine can take a few dozen Node.JS processes. In fact, we have machines running 40 and more Node.JS microservices.
Once you have too much load for one instance, it’s easy to scale it horizontally. Usually, databases or external services are going down under the load. I’ll talk more about scaling in the next section.
What’s important about being lightweight is that it saves money on AWS costs. It might sound like a boring reason from an engineering perspective, but it’s pretty essential business-wise.
However, there are exceptions. For example, ML models, geo-specific tasks (maps, routing), and data transformations don’t use Node.JS, but technologies that significantly benefit these specific use cases. For example, Python has all the libraries support for data science, and Spark shines with data transformations.
But most of the time, having single technology outweighs the cost of supporting a broader tech stack — even if some tasks would run better on, let’s say, Golang. So although, for example, Node might not be the best choice for heavy in-memory computations, still, most of the time, it’s possible to get away with just adding more instances.
One of the main complexities of developing high-load systems is concurrency. But Node.JS is single-threaded on execution, which makes most of the issues related to concurrency go away. So threads, monitors, fork-join pools — you can forget about those.
Some may remember how they’ve been handling requests before. You had a separate thread/process for each request, which consumed significant resources. Node.JS stores all the requests in the event loop and puts them to wait on every async call. So request doesn’t occupy the processor time while it’s in the waiting state.
To understand how the event loop works, check out this interactive tool — I recommend it to every new joiner in my team.
Although working with DB and external services still requires the handling of concurrency issues, event loop makes code significantly flatter and easier to read.
You might think that a single-threaded service won’t be able to serve many requests. But as I mentioned already, most business services are lightweight, and the database usually does the heavy lifting. So even at our scale, serving hundreds of requests per second, at least until recently, most services can still handle the load on one instance (it still doesn’t make it a good idea to keep only one process running).
Of course, there is a drawback. You must be careful with the event loop, as you have only one execution thread. Whenever there is any long-running function, all other requests are waiting, and the situation could quickly escalate with a snowball effect.
For example, reckless use of some map().reduce() of massive arrays is an easy way to overload service and cause failed requests.
Or the example I had in my team with Prometheus metrics scraping. Prometheus is pulling metrics from the service every 15 seconds, and in our case, metrics file offloading has grown to be >300ms, blocking the event loop enough to create a spike on the DB that eventually brought the service down.
We keep track of event loop lag and alert the team if a service is getting slow to prevent such cases.
Sometimes, it takes effort to get around the single-threaded limitation. For example, one of our most loaded services runs on 27 machines with close to 400 processes.
There are several important factors when choosing a programming language. For one, you need something robust enough to build a business for years ahead. On the other hand, engineers love to try novel tools, and hiring people for some ‘rusty’… sorry, outdated technology would be hard.
The NPM library registry and an extensive community provide access to numerous open-source libraries. With drivers for MySQL/Databases, Elastic, Prometheus, Kafka, AWS support, many different libraries, and community support. Great IDE support, be it IntelliJ or VS Code.
Let me repeat it — for ~99% of backend-related systems, we use Node.JS as a solution. This has several implications. First, to be onboarded, you must know only one language. Then you can easily contribute to other projects and understand dependencies, so engineers can focus on solving business problems over debugging another shiny new tool.
Still, finding great engineers is always a struggle. Fortunately, we don’t require previous Node.JS exposure, as it’s quick to adjust to it from a general OOP background, as programming languages are heading towards similar syntax, borrowing concepts from one another.
We build on top of it
Here’s the most crucial part. Language is a good foundation, but its use is arguably more important.
To be fair, Node.JS just happened to be around, gaining popularity at the time of Bolt’s initial architecture layout.
And while we’ve changed the DB engine at least three times already, Node.JS still keeps serving us well.
We’re trying to keep our stack up to date: we started using TypeScript when it came out. Later, we rewrote our code base to use “async / await”s, so no more “callback hell” — code stays flat and readable.
And as we’ve been using this stack for over five years, significant internal infrastructure has been built around it. The most notable would be the internal microservice framework.
Imagine you have to create a new microservice. Aside from writing down business logic, you have to think about everything from deployment and monitoring to connection pools, along with authentication to the database, queues, cache, and communicating with other microservices. And then, you have to write tests that work with these components. And as your system becomes more loaded, you also need scaling, alerting, circuit breakers, and even more tests.
There has been an entire genre of books and conferences on how to do microservices ‘right’, as they do introduce a lot of overhead. But what if it all could be solved for backend engineers?
Managing this routine is exactly what our internal framework does.
For example, let’s say you need a new microservice. All you have to do is to create a Bootstrap.ts file, mention it in the package config, and the service is ready to be deployed, along with numerous monitoring dashboards and alerting with just a couple of lines of code. To add a DB to the service — describe only a DB name in the services config — credentials and connection pool with some safety features like buffering are solved automagically.
But the case I like the most is around request validation.
Every technology has its downsides, but when you have just one technology to support, it’s easy to find and mitigate them.
For JS, one downside is that it’s not type-safe on runtime. So each request and response has to be validated. Luckily, as we use Typescript, all interfaces have defined types, so validators for all requests can be generated from these interfaces. And that’s what our framework does.
Moreover, most of the code in our backend repository is auto-generated.
And combining this with the fact that we use one monorepo for all backend services, we can generate API definitions of other backend services, making requests to other services look like a method call. As well as API mocks for them, so writing tests becomes significantly easier.
All of this made it possible to have an extremely rapid development cycle, as most infrastructure issues are solved once and for all.
To sum up
Using Node.JS allowed Bolt to grow rapidly. The secret is not in Node.JS itself but in leveraging this technology by building on top of it, covering its downsides. And we focus on one solution by limiting the tech stack we use.
I’ve tried to keep this article as a showcase of Bolt’s internal architecture. However, if you’re still not convinced about Node.JS, many excellent articles have been written about it. Feel free to check them out!
We offer a unique opportunity for individuals to learn and develop while making a meaningful impact on millions of people across the globe in a hyper-growth environment.
If you feel inspired by our approach and want to join our journey, check out our vacancies.
We have hundreds of open roles, and if you’re ready to work in an exciting, dynamic, fast-paced industry and are not afraid of a challenge, we’re waiting for you!