Eleven Tips to Scale Node.js

Published in

Microsoft Azure

12 min readSep 18, 2018

This article was written in collaboration with Node.js developers and architects from nearForm.

Photo by Clément Chéné, licensed under CC BY 2.0

Node.js is an amazingly productive language and allows for previously frontend-only devs to hop into the backend and start writing code. Node.js is also capable of running at world-class scale as evidenced by the deployments at companies like Netflix, Reddit, Walmart, and eBay. However, Node.js has its own set of challenges in scaling; both in terms the scaling of people working on a single code base and in terms of scaling vertically and horizontally in the cloud. In addition to my own experience scaling Node.js at Reddit and Netflix, I talked to some of the experts working on Microsoft Azure and came up with a few tips for you to scale Node.js at your company.

Write quality Node.js

The sooner you start linting, formatting, and type-checking your code, the better. These things can be difficult to introduce mid-project due to the large amounts of refactoring they can take and how they can pollute your git history but in the end they will help you write consistently readable code.

If are you not using them already, immediately consider adding ESLint and Prettier to your code base. ESLint is a code linting too that will prohibit bad patterns from being checked in while Prettier is an automated code formatter that removes all the bikeshedding in your pull requests.

A more substantial undertaking is adding a tool like Flow or TypeScript to your codebase. These tools will catch subtle bugs like calling a function with a number instead of a string or trying to call .filteron an object instead of an array. While difficult and coming with a learning curve for your team, these tools merit your consideration due to to how they can speed up development thanks to Intellisense and how they can prevent runtime bugs thanks to type safety.

Write a gradient of tests

Tests are a tricky subject for developers. Some believe thoroughly in the gospel of test-driven development while others rarely write any tests at all. There is a middle ground here that could be a sweet spot.

Identify key modules and flows and write exhaustive unit tests for these areas. Pay special attention to “happy paths”, edge cases, and any scenarios where bugs are prone to emerge. For other modules, write a unit test or two to cover a “happy path” and perhaps common edge cases you may have identified.
Minimal UI testing. UI is constantly in flux and often it’s not useful to spend a bunch of time writing tests for code that’s going to change frequently.
Write tests for bug fixes. Whenever you find and fix a bug, write a unit test that would catch that bug in the future.
Write a few integration tests to make sure all the pieces fit together.
Write even fewer end-to-end tests. Cover the key paths in your site, for instance if you’re creating an e-commerce site, perhaps write tests for login, add-to-cart, and checkout. These tests are expensive to maintain so consider keeping only a small core of tests you’re motivated to maintain.

The point of writing tests is to be able to deploy new code with confidence. Write no less tests than what achieves that feeling for you, and try not to write much more than that either.

Design for stateless

A key when writing scalable Node.js isthat your endpoints are stateless. Your server cannot keep a state for someone or something on the server. Doing this would prohibit you from scaling horizontally, which means throwing more servers at the problem and letting a load balancer distribute those calls. Think about this early; this is very difficult to unravel if you don’t do it early. This will also help if you ever decide to decompose monoliths into microservices.

Serve static from Node.js in dev, serve it from a CDN in prod

I wish I saw companies make this mistake less often. Serving your static assets from your web application (particularly through something like webpack-dev-server or Parcel’s dev server) is a great developer experience since it shortens the feedback loop when you’re writing code. However you should never serve your static assets via Node.js. They should be compiled separately and served via CDN, like Azure CDN. Serving it from Node.js is unnecessarily slow since CDNs are more dispersed and therefore normally physically closer to the end user and CDN servers are highly optimized for server small assets. Serving assets from Node is also unnecessarily expensive since Node.js server time is far more expensive than CDN server time.

Deploy early, deploy often

I don’t know about you, but the first time I deploy something it never works. Usually I’m forgetting to send the right secrets or I’ve hardcoded a localhost path somewhere. Usually small issues that will make it work locally but not remotely. However if not dealt with on a regular cadence during development these issues can pile up and what could have been a simple fix if caught early can turn into a rat’s nest to pull apart if you make critical errors in your architecture.

Visual Studio Code makes this easy to do. It allows you to deploy right to Azure with just right clicking on the app and clicking “Deploy to Azure”. This is an easy way to validate that everything works when deployed into a different environment. It’s also a great way to get a publicly shareable link so someone else can check out your progress.

Deploy two servers right away

This comes from hard-won knowledge and a lot of heartache. There is little difference between deploying two servers and ten servers, and there’s little difference between deploying ten servers and one hundred servers. However there is a massive difference between deploying one server and two servers. Similar to the point on deploying stateless servers, starting right away with two servers (and never going beneath that) will quickly surface your issues with horizontal scaling so that when it comes time to scale up due to an unexpected spike in traffic (like a viral tweet or being the front page of Hacker News) you already ready to scale out to meet the demand.

Don’t fear the queue

Modern databases deal with a certain amount of reading and writing scale by themselves with no help. When you’re proving out your idea, feel free to rely on your database to handle a small to medium size load. Premature scaling is more likely to kill you than save you. That being said, at some point you will outgrow the contract of your app writing directly the database. For some that may come later as you have a light-write load sort of problem or you chose a database like Cassandra which handles massive scale by itself and for others it will come sooner because you have an intensive-write app or you want do some off-loaded additional processing on your data. In any case, a messaging queue is a tool you want to be aware of and how it can help you.

You have many options here to choose from in terms of what tech to go with. The de facto standard at the moment is Apache Kafka which allows you to organize your messages into topics and then other applications are free subscribe to topics. Your data scientists can pick off data for their own uses, you can transform pieces of data in that pipeline, you can feed different pieces of data into different data stores, and ultimately at the end of the topic you can batch together writes to your database so that it’s not being hammered all the time. Kafka is really easy to get running on Azure.

Another tool you may consider using is Azure Event Hub. If you’re already invested into the Azure ecosystem this tool is incredibly easy to use to connect the disparate parts of Azure together. It uses a familiar pub-sub mechanism to connect different things and it’s really easy to plug in things like Azure Functions or Azure Logic Apps for quick-and-dirty processing of your data using JavaScript or a visual flow editor, respectively.

Microservices and containers at scale

As you grow your application, natural divisions of logic begin to appear. This part of the app may process payments while that other part serves the necessary API data to the front end. Embrace this changes and consider making them separate microservice. A microservice is really just another app that you’re running that does a smaller job than a monolithic app would do. Do be careful because introducing microservices introduces a lot of complexity as well. However don’t underestimate the gains that you can have separate teams working on different parts of the app without having to coordinate a lot. Furthermore they can wired up for different metrics, one can go down without taking down the whole app, and you can scale them independently.

This can make running the app locally hard and coordinating deploys even harder. This is where something like Docker and Kubernetes can come in super handy. You can think of a container like a mini instance of Linux or Windows you can run your app in (Docker helps you do that) and Kubernetes as the tool that plugs all your containers together out in the cloud.

Kubernetes can be complicated beast. It solves a complicated problem. As one not experienced in DevOps sorcery, it can be difficult to get started in the space. I suggest starting with Draft. If you’re familiar with Yeoman for JavaScript projects, Draft is that for Kubernetes projects: a tool that will create the scaffold for your project for you so you can start hacking on it based on some blueprint (called packs.) From there, you can use a tool called Helm to install additional pieces of architecture you need to get going (like NGINX, more Node.js servers, MongoDB, Kafka, etc.), almost like npm for Kubernetes.

Once you’re invested in the Kubernetes ecosystem, it’s child’s play to get that into the cloud. All the big cloud providers are doubling down on Kubernetes. Azure has a service called Azure Kubernetes Service (AKS) that’s optimized for such a strategy. It’s a fun way to make and share infrastructure. Give it a shot.

Gather them metrics

If you don’t know how to answer the question “How’s my app doing?” then you have big problems, or you will soon. These metrics over time will help you continually improve the state of your app, both from a cost of running it perspective and from a user experience perspective of improving your response times. You should definitely be staying on top of metrics like slow-running paths, time-to-first-byte, page views, session times, and other key metrics that are important to your core business.

There are many ways to gather these metrics. Services like New Relic and AppDynamics will offer you invaluable insights on how to improve your app and help you prevent regressions between deployments. If you’re working with Azure, Application Insights covers this need well too and it’s easy to plug into other tools like CI/CD so you can prevent deployments if it sees too much of a regression.

CI/CD will save you so much pain

How many times have you messed up an FTP deploy and brought your server down for a clenching few minutes? I certainly have. You should never, ever trust yourself to deploy production code. The way we talked about how to do it from Visual Studio Code is pretty cool but it’s for mostly for development or demo purposes. Once you are ready to have a production-level system, you should be using continuous integration and continuous deployment (often abbreviated CI/CD.)

Continuous integration is where you will validate code going into your code base. You can have any number of things kick off CI but the one I prefer is have it run any time someone checks code into your master branch. You will run all your linting, type checking, testing, and whatever other validation you need here to give yourself a high level of confidence that you are not about to cause down time. If you don’t pass CI, you can prevent yourself from deploying a broken release.

Continuous deployment will take what your code that passed CI, run whatever build steps you need to, containerize or package it, and send it out to a server. It’s a great idea to have multiple layers here for validation. Perhaps you first go to an internal dev server so you can see it first in a low-risk environment. You can validate it first before sending it on to a QA environment where your QA engineers or maybe an external service will validate that everything works as expected. From there you can go to a staging environment where your app is still internal only but running using production data and settings so you can verify it in the most production-like environment before sending it off to be canaried. A canary is where you have a small group of servers running your new code and you only send a small percentage of real traffic to those servers to validate that nothing breaks with real users. If it does break, you spin down the canary servers and find the issue. If it doesn’t break, you slowly ramp from a small group of users to everyone. You keep the old servers running and warm until you feel confident everything and working just in case you need to rollback quickly and then spin the old servers down too.

Many providers and open source projects address these needs. Jenkins, Travis, and CircleCI are all great options for CI. Azure has its own CI/CD service called Azure Pipelines and it’s a pretty intuitive to use, and again it plugs easily into a cohesive Azure ecosystem. It’s even free for open source projects! The described-above pattern is baked-in and easy to do.

Keeping secrets

Any application inevitably has secrets of some sort. These will be keys and strings like database credentials, Twitter or Facebook secrets, session keys, or any other number of things would be really bad if they made into the wrong hands. However, they are essential to a running an application. So what do we do? Commonly, in development, we’ll use tools like dotenv to keep a config file locally and be able to read it in via process.env in Node.js. This is great for dev but terrible for production. These credentials should never be checked into source control and if an attacker gets ahold of a server they instantly are granted the keys to the whole kingdom.

Instead it’s good to use some sort of secrets management tool. Fortunately Kubernetes has this built in and it’s pretty straightforward to use. You provide Kubernetes the secrets on the container side and then it will provide them to your app as an environment which makes it much harder for an attacker to get to.

Another tool worthy of your consideration is Key Vault from Azure. What’s cool about Key Vault is despite the fact Microsoft cannot read your keys (only you have the ability to decrypt them) Azure will keep an eye on your logs and monitoring to watch for any unsavory uses of your keys to warn you of any compromises. It can also avoid using connection strings inside of Azure by only providing keys to machines and instances (called Service Principles) that your pre-authorize of seeing those keys, avoiding the chicken-and-egg problem of needing to have a private key to access your private keys. Lastly, it’s really cool because you can give devs a different connection string to the key vault so they can develop against a separate set of credentials of Twitter secrets, database credentials, and session keys and seamlessly switch that to the production keys in production without any additional code. It’s a security tool that actually makes development easier.

Conclusion

Node.js is being used more and more in the enterprise where it’s being proved that it can scale as well as any other platform. While it has its own challenges of scaling, it has its own peculiar boons that will benefit your users and your developers. Investing in Node.js is a great choice and becoming easier to find developers for. At Azure we’re working on making it even easier to scale your Node.js apps by providing you with intelligent tools that assist you in building your app as your traffic grows. Please reach out to us if you have any questions, whether those questions have to do with Azure or any Node.js cloud deployment.