How Did We Reduce Node Docker Image Size In 3 Steps

Soner Çökmen
Trendyol Tech
Published in
7 min readJan 22, 2020

--

Dockerizing an application is simple. There are lots of documentation, tutorials, and examples available for almost all tech stacks. However, although it is simple, you should not be surprised when the two following problems occur after a while.

  • Long build durations
  • Large docker image sizes

In this article, we will focus on how we reduced the node docker image size at Trendyol in three simple steps.

Why Is The Image Size Important?

Why is the image size important? What are the real advantages of a smaller one?

  1. First of all, bigger images take up more disk space. It means they are expensive. It is likely that you are using a central repository manager to store your Docker images. When the image sizes increase, storing these images and older versions becomes difficult.
  2. Bigger images take a longer time to transfer over the network. This latency badly affects the performance of the CI/CD pipelines.

Now that we agree that docker image size is important, let’s see how we can reduce node docker image sizes in three simple steps.

Let’s Start The Optimization

The demo application was built using the NestJS framework. NestJS is a progressive NodeJS framework that comes with lots of dependencies.

Here is our initial Dockerfile.

Initial Dockerfile

It seems simple right? Let’s analyze the output.

$ docker images | grep apiapi    latest    c56ed4431a7f    22 minutes ago    1.52GB

It is 1.52GB! It is more than we expected. We can inspect the layers of the image by using the following command. Besides, you can use one of my favorite tools, dive.

$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop)  CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c yarn build : 193kB
/bin/sh -c yarn lint & yarn test : 128kB
/bin/sh -c #(nop) COPY dir:04c91384841be9403… : 4.41MB
/bin/sh -c yarn --frozen-lockfile : 605MB
/bin/sh -c #(nop) COPY multi:009be4c25183643… : 354kB
/bin/sh -c #(nop) WORKDIR /usr/src/app : 0B
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c set -ex && for key in 6A010… : 5.48MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 : 0B
/bin/sh -c ARCH= && dpkgArch="$(dpkg --print… : 72MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 : 0B
/bin/sh -c groupadd --gid 1000 node && use… : 333kB
/bin/sh -c set -ex; apt-get update; apt-ge… : 562MB
/bin/sh -c apt-get update && apt-get install… : 142MB
/bin/sh -c set -ex; if ! command -v gpg > /… : 7.81MB
/bin/sh -c apt-get update && apt-get install… : 23.3MB
/bin/sh -c #(nop) CMD ["bash"] : 0B
/bin/sh -c #(nop) ADD file:8f7dc710e276f54a3… : 101MB

The output shows that two huge layers exist in our image. Also, we can see that our base image takes up 913 MB of disk space. Therefore, as the first step of the optimization, we can choose a smaller base image.

1) Choose Smaller Base Image As Possible

When dockerizing a node application, there are lots of base image options available from which we can choose.

  • jessie-*
  • buster-*
  • stretch-*
  • alpine-*

jessie-*, buster-* and stretch-* images are based on Debian, alpine-* images are based on Alpine Linux.

Unless you need a specific Linux distro it is better to prefer alpine image.

Let’s build our image with the node:12-alpine base image.

Dockerfile with Alpine
$ docker build -t api .error /usr/src/app/node_modules/couchbase: Command failed.
Exit code: 1
Command: prebuild-install || node-gyp rebuild
Arguments:
Directory: /usr/src/app/node_modules/couchbase
Output:
prebuild-install WARN install No prebuilt binaries found (target=12.14.0 runtime=node arch=x64 libc=musl platform=linux)
gyp info it worked if it ends with ok
gyp info using node-gyp@5.0.5
gyp info using node@12.14.0 | linux | x64
gyp ERR! find Python
gyp ERR! find Python Python is not set from command line or npm configuration
gyp ERR! find Python Python is not set from environment variable PYTHON
gyp ERR! find Python checking if "python" can be used
gyp ERR! find Python - "python" is not in PATH or produced an error
gyp ERR! find Python checking if "python2" can be used
gyp ERR! find Python - "python2" is not in PATH or produced an error
gyp ERR! find Python checking if "python3" can be used
gyp ERR! find Python - "python3" is not in PATH or produced an error
gyp ERR! find Python

Oops, it failed because of Couchbase SDK requirements. The node:alpine image does not contain Python, make, or g++ packages required for compiling Couchbase SDK. To solve this problem, we can manually add these packages to our Dockerfile.

Couchbase SDK Requirements
$ docker images | grep apiapi    latest    836d38527adc    14 seconds ago    903MB$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop)  CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c yarn build : 195kB
/bin/sh -c #(nop) COPY dir:84d26c512953ced47… : 4.58MB
/bin/sh -c yarn --frozen-lockfile : 606MB
/bin/sh -c #(nop) COPY multi:009be4c25183643… : 354kB
/bin/sh -c #(nop) WORKDIR /usr/src/app . : 0B
/bin/sh -c apk update && apk add python make… : 206MB
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c apk add --no-cache --virtual .bui… : 5.35MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 . : 0B
/bin/sh -c addgroup -g 1000 node && addu… : 74.2MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 . : 0B
/bin/sh -c #(nop) CMD ["/bin/sh"] : 0B
/bin/sh -c #(nop) ADD file:36fdc8cb08228a870… : 5.59MB

It's better than before but has yet to be satisfying. There are still two big layers that exist in our latest image:

  • Python, make, g++ layer (we recently added for Couchbase SDK)
  • Node dependencies layer

2) Use Multi-Stage Docker Builds

Multi-stage builds made it easy to optimize Docker images by using multiple intermediate images in a single Dockerfile. Detailed information can be found here. Using multi-stage builds, we can install all dependencies in the build image and copy them to the runtime image. In this way, the runtime image will not contain “Python”, “make”, and “g++” dependencies.

Multi-Stage Dockerfile
$ docker images | grep apiapi    latest    369b27345377    8 minutes ago    375MB$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop)  CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c ls : 0B
/bin/sh -c #(nop) COPY dir:513b34f89adcbc3a6… : 630B
/bin/sh -c #(nop) COPY dir:2809a54b360d448e9… : 289MB
/bin/sh -c #(nop) COPY dir:52d84e1dbbcb0afee… : 215kB
/bin/sh -c #(nop) WORKDIR /usr/src/app : 0B
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c apk add --no-cache --virtual .bui… : 5.35MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 . : 0B
/bin/sh -c addgroup -g 1000 node && addu… : 74.2MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 . : 0B
/bin/sh -c #(nop) CMD ["/bin/sh"] : 0B
/bin/sh -c #(nop) ADD file:36fdc8cb08228a870… : 5.59MB

375MB? But we were expecting only 206MB improvement because of eliminating the “Python - make - g++” layer. Let’s inspect the previous image with the dive tool to find out the reason for the unexpected improvement.

We found that our node_modules folder is only 289MB, but the previous image also includes the 311MB of the yarn cache folder. By using multi-stage builds, we also eliminated these cache folders.

At the end of step 2, we have a 375MB image. This is because we have shrunk the initial image by %315 so far.

3) Shrink Dependencies

Before shrinking the dependencies, we must organize them in the package.json file. Most of the time, there are two options for dependencies.

  • dependency: Packages that are required for runtime. Such as express, mongoose, body-parser, nest-js, sql-connectors.
  • devDependency: Packages that are required for development time. Such as test frameworks, typing files, and eslint packages.

We don’t need the development dependencies in our runtime image.

3.1) Install Production Dependencies Only

Installing only production dependencies is a way of eliminating development dependencies.

$ yarn install --production=true

But if you want to;

  • run static code analyzers,
  • run test suites,
  • compile Typescript files

you need to install the development dependencies, too. In such cases, this solution won’t work.

3.2) Remove Development Dependencies Manually

The second option is to remove development dependencies manually.

$ npm prune --production

This command removes all packages specified in the devDependencies section. You can find more details here.

Dockerfile with Npm Prune

And the result:

$ docker images | grep apiapi    latest    f1adbbffb504    13 seconds ago    145MB

Only 155MB. We can reduce it much more.

3.3) Use Node Prune Tool

The node-prune is an open-source tool for removing unnecessary files from the node_modules folder. Maintainers may forget to ignore test files, markdown files, typing files, and *.map files in Npm packages. By using node-prune, we can safely delete them.

Dockerfile with node-prune
$ docker images | grep apiapi    latest    33d086802e50    22 seconds ago    127MB

It reduces image size by 18MB. Better than nothing.

3.4) Delete Remaining Dependencies Manually

After the node-prune step, there may be several significant unnecessary dependencies remaining in node_modules. You can manually detect them by using the following command described here.

$ du -sh ./node_modules/* | sort -nr | grep '\dM.*'11M     ./node_modules/rxjs
8.7M ./node_modules/swagger-ui-dist
7.2M ./node_modules/couchbase

The output depends on your dependencies and it is not guaranteed that listed dependencies are unnecessary. Here is our final Dockerfile.

Dockerfile with rm -rf
$ docker images | grep apiapi    latest    3eed30c8606b    13 seconds ago    118MB

Conclusion

By applying three simple steps, we reduced our docker image size by 13 times.

I strongly recommend running unit tests and performance tests between these steps. Because;

  • Chancing the base image may slow down your application because of the lack of some operating-system-level packages.
  • By mistake, some runtime dependencies may be specified as development-time dependencies.
  • Dependency files you think unnecessary may be required by another dependency you don’t know.

--

--