Trendyol Tech
Published in

Trendyol Tech

How We Reduce Node Docker Image Size In 3 Steps

Dockerizing an application is simple. There are lots of documentation, tutorials, and examples available for almost all tech stacks. However, although docker is simple, you should not be surprised when the two following problems occur after a while.

  • Long build durations
  • Large docker image sizes

In this article, we will focus on how we reduce the node docker image size at Trendyol in three simple steps.

Why Is The Image Size Important?

Why is the image size important? What are the real advantages of a smaller one?

  1. First of all, bigger images take up more disk space. It means they are expensive. There is probably a central repository manager that you are using to store the docker images. When the image sizes increase, It becomes difficult to store these images and older versions.
  2. Bigger images take a longer time when transferring over the network. This latency affects the performance of the CI/CD pipelines badly.

Now that we agree that docker image size is important let’s see how we can reduce node docker image sizes in three simple steps.

Let’s Start The Optimization

The demo application was built using the NestJS framework. NestJS is a progressive NodeJS framework that comes with lots of dependencies.

Here is our initial Dockerfile.

Initial Dockerfile

It seems simple right? Let’s analyze the output.

$ docker images | grep apiapi    latest    c56ed4431a7f    22 minutes ago    1.52GB

It is 1.52GB! It is more significant than we expected. We can inspect the layers of the image by using the following command. Besides, you can use one of my favorite tools, dive.

$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop)  CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c yarn build : 193kB
/bin/sh -c yarn lint & yarn test : 128kB
/bin/sh -c #(nop) COPY dir:04c91384841be9403… : 4.41MB
/bin/sh -c yarn --frozen-lockfile : 605MB
/bin/sh -c #(nop) COPY multi:009be4c25183643… : 354kB
/bin/sh -c #(nop) WORKDIR /usr/src/app : 0B
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c set -ex && for key in 6A010… : 5.48MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 : 0B
/bin/sh -c ARCH= && dpkgArch="$(dpkg --print… : 72MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 : 0B
/bin/sh -c groupadd --gid 1000 node && use… : 333kB
/bin/sh -c set -ex; apt-get update; apt-ge… : 562MB
/bin/sh -c apt-get update && apt-get install… : 142MB
/bin/sh -c set -ex; if ! command -v gpg > /… : 7.81MB
/bin/sh -c apt-get update && apt-get install… : 23.3MB
/bin/sh -c #(nop) CMD ["bash"] : 0B
/bin/sh -c #(nop) ADD file:8f7dc710e276f54a3… : 101MB

The output shows that two huge layers exist in our image. Also, we can see that our base image takes up 913 MB of disk space. Therefore, as the first step of the optimization, we can choose a smaller base image.

1) Choose Smaller Base Image As Possible

When dockerizing a node application, there are lots of base image options available we can choose from.

  • jessie-*
  • buster-*
  • stretch-*
  • alpine-*

jessie-*, buster-* and stretch-* images are based on Debian, alpine-* images are based on Alpine Linux.

Unless you need a specific Linux distro it is better to prefer alpine image.

Let’s build our image with the node:12-alpine base image.

Dockerfile with Alpine
$ docker build -t api .error /usr/src/app/node_modules/couchbase: Command failed.
Exit code: 1
Command: prebuild-install || node-gyp rebuild
Arguments:
Directory: /usr/src/app/node_modules/couchbase
Output:
prebuild-install WARN install No prebuilt binaries found (target=12.14.0 runtime=node arch=x64 libc=musl platform=linux)
gyp info it worked if it ends with ok
gyp info using node-gyp@5.0.5
gyp info using node@12.14.0 | linux | x64
gyp ERR! find Python
gyp ERR! find Python Python is not set from command line or npm configuration
gyp ERR! find Python Python is not set from environment variable PYTHON
gyp ERR! find Python checking if "python" can be used
gyp ERR! find Python - "python" is not in PATH or produced an error
gyp ERR! find Python checking if "python2" can be used
gyp ERR! find Python - "python2" is not in PATH or produced an error
gyp ERR! find Python checking if "python3" can be used
gyp ERR! find Python - "python3" is not in PATH or produced an error
gyp ERR! find Python

Oops, it failed because of Couchbase SDK requirements. The node:alpine image does not contain Python, make, or g++ packages that are required for compiling Couchbase SDK. To solve this problem, we can manually add these packages to our Dockerfile.

Couchbase SDK Requirements
$ docker images | grep apiapi    latest    836d38527adc    14 seconds ago    903MB$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop)  CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c yarn build : 195kB
/bin/sh -c #(nop) COPY dir:84d26c512953ced47… : 4.58MB
/bin/sh -c yarn --frozen-lockfile : 606MB
/bin/sh -c #(nop) COPY multi:009be4c25183643… : 354kB
/bin/sh -c #(nop) WORKDIR /usr/src/app . : 0B
/bin/sh -c apk update && apk add python make… : 206MB
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c apk add --no-cache --virtual .bui… : 5.35MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 . : 0B
/bin/sh -c addgroup -g 1000 node && addu… : 74.2MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 . : 0B
/bin/sh -c #(nop) CMD ["/bin/sh"] : 0B
/bin/sh -c #(nop) ADD file:36fdc8cb08228a870… : 5.59MB

Better than before but not satisfying yet. There are still two big layers that exist in our latest image:

  • Python, make, g++ layer (we recently added for Couchbase SDK)
  • Node dependencies layer

2) Use Multi-Stage Docker Builds

Multi-stage builds made it easy to optimize Docker images by using multiple intermediate images in a single Dockerfile. Detailed information can be found here. By using multi-stage builds, we can install all dependencies in the build image and copy them to the runtime image. In this way, the runtime image will not contain “Python”, “make”, and “g++” dependencies.

Multi-Stage Dockerfile
$ docker images | grep apiapi    latest    369b27345377    8 minutes ago    375MB$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop)  CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c ls : 0B
/bin/sh -c #(nop) COPY dir:513b34f89adcbc3a6… : 630B
/bin/sh -c #(nop) COPY dir:2809a54b360d448e9… : 289MB
/bin/sh -c #(nop) COPY dir:52d84e1dbbcb0afee… : 215kB
/bin/sh -c #(nop) WORKDIR /usr/src/app : 0B
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c apk add --no-cache --virtual .bui… : 5.35MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 . : 0B
/bin/sh -c addgroup -g 1000 node && addu… : 74.2MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 . : 0B
/bin/sh -c #(nop) CMD ["/bin/sh"] : 0B
/bin/sh -c #(nop) ADD file:36fdc8cb08228a870… : 5.59MB

375MB? But we were expecting only 206MB improvement because of eliminating the “Python - make - g++” layer. To find out the reason for unexpected improvement, let’s inspect the previous image with the dive tool.

We found that our node_modules folder is only 289MB, but the previous image also includes the 311MB of the yarn cache folder. By using multi-stage builds, we also eliminated these cache folders too.

At the end of step 2, we have a 375MB image. This is because we shrunk the initial image by %315 so far.

3) Shrink Dependencies

Before shrinking the dependencies, we must organize our dependencies in the package.json file. Most of the time, there are two options for dependencies.

  • dependency: Packages that are required for runtime. Such as express, mongoose, body-parser, nest-js, sql-connectors.
  • devDependency: Packages that are required for development time. Such as test frameworks, typing files, and eslint packages.

We don’t need the development dependencies in our runtime image.

Installing only production dependencies is a way of eliminating development dependencies.

$ yarn install --production=true

But if you want to;

  • run static code analyzers,
  • run test suites,
  • compile Typescript files

you need to install the development dependencies too. In such cases, this solution won’t work.

The second option is removing development dependencies manually.

$ npm prune --production

This command removes all packages specified in the devDependencies section. You can find more detail here.

Dockerfile with Npm Prune

And the result:

$ docker images | grep apiapi    latest    f1adbbffb504    13 seconds ago    145MB

Only 155MB. I think we can reduce much more.

The node-prune is an open-source tool for removing unnecessary files from the node_modules folder. For example, most developers forget test files, markdown files, typing files, and *.map files in Npm packages. By using node-prune, we can safely delete them.

Dockerfile with node-prune
$ docker images | grep apiapi    latest    33d086802e50    22 seconds ago    127MB

It reduces image size by 18MB. Better than nothing.

After the node-prune step, there may be several significant unnecessary dependencies remaining in node_modules. You can manually detect them by using the following command described here.

$ du -sh ./node_modules/* | sort -nr | grep '\dM.*'11M     ./node_modules/rxjs
8.7M ./node_modules/swagger-ui-dist
7.2M ./node_modules/couchbase

The output depends on your dependencies and it is not guaranteed that listed dependencies are unnecessary. Here is our final Dockerfile.

Dockerfile with rm -rf
$ docker images | grep apiapi    latest    3eed30c8606b    13 seconds ago    118MB

Conclusion

By applying three simple steps, we reduced our docker image size by 13 times.

I strongly recommend running unit tests and performance tests between these steps. Because;

  • Chancing the base image may slow down your application because of the lack of some operating-system-level packages.
  • By the mistake, some runtime dependencies may be specified as a development-time dependency.
  • Dependency files you think unnecessary may be required by another dependency you don’t know.