How Did We Reduce Node Docker Image Size In 3 Steps
Dockerizing an application is simple. There are lots of documentation, tutorials, and examples available for almost all tech stacks. However, although it is simple, you should not be surprised when the two following problems occur after a while.
- Long build durations
- Large docker image sizes
In this article, we will focus on how we reduced the node docker image size at Trendyol in three simple steps.
Why Is The Image Size Important?
Why is the image size important? What are the real advantages of a smaller one?
- First of all, bigger images take up more disk space. It means they are expensive. It is likely that you are using a central repository manager to store your Docker images. When the image sizes increase, storing these images and older versions becomes difficult.
- Bigger images take a longer time to transfer over the network. This latency badly affects the performance of the CI/CD pipelines.
Now that we agree that docker image size is important, let’s see how we can reduce node docker image sizes in three simple steps.
Let’s Start The Optimization
The demo application was built using the NestJS framework. NestJS is a progressive NodeJS framework that comes with lots of dependencies.
Here is our initial Dockerfile.
It seems simple right? Let’s analyze the output.
$ docker images | grep apiapi latest c56ed4431a7f 22 minutes ago 1.52GB
It is 1.52GB! It is more than we expected. We can inspect the layers of the image by using the following command. Besides, you can use one of my favorite tools, dive.
$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop) CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c yarn build : 193kB
/bin/sh -c yarn lint & yarn test : 128kB
/bin/sh -c #(nop) COPY dir:04c91384841be9403… : 4.41MB
/bin/sh -c yarn --frozen-lockfile : 605MB
/bin/sh -c #(nop) COPY multi:009be4c25183643… : 354kB
/bin/sh -c #(nop) WORKDIR /usr/src/app : 0B
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c set -ex && for key in 6A010… : 5.48MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 : 0B
/bin/sh -c ARCH= && dpkgArch="$(dpkg --print… : 72MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 : 0B
/bin/sh -c groupadd --gid 1000 node && use… : 333kB
/bin/sh -c set -ex; apt-get update; apt-ge… : 562MB
/bin/sh -c apt-get update && apt-get install… : 142MB
/bin/sh -c set -ex; if ! command -v gpg > /… : 7.81MB
/bin/sh -c apt-get update && apt-get install… : 23.3MB
/bin/sh -c #(nop) CMD ["bash"] : 0B
/bin/sh -c #(nop) ADD file:8f7dc710e276f54a3… : 101MB
The output shows that two huge layers exist in our image. Also, we can see that our base image takes up 913 MB of disk space. Therefore, as the first step of the optimization, we can choose a smaller base image.
1) Choose Smaller Base Image As Possible
When dockerizing a node application, there are lots of base image options available from which we can choose.
- jessie-*
- buster-*
- stretch-*
- alpine-*
jessie-*, buster-* and stretch-* images are based on Debian, alpine-* images are based on Alpine Linux.
Unless you need a specific Linux distro it is better to prefer alpine image.
Let’s build our image with the node:12-alpine base image.
$ docker build -t api .error /usr/src/app/node_modules/couchbase: Command failed.
Exit code: 1
Command: prebuild-install || node-gyp rebuild
Arguments:
Directory: /usr/src/app/node_modules/couchbase
Output:
prebuild-install WARN install No prebuilt binaries found (target=12.14.0 runtime=node arch=x64 libc=musl platform=linux)
gyp info it worked if it ends with ok
gyp info using node-gyp@5.0.5
gyp info using node@12.14.0 | linux | x64
gyp ERR! find Python
gyp ERR! find Python Python is not set from command line or npm configuration
gyp ERR! find Python Python is not set from environment variable PYTHON
gyp ERR! find Python checking if "python" can be used
gyp ERR! find Python - "python" is not in PATH or produced an error
gyp ERR! find Python checking if "python2" can be used
gyp ERR! find Python - "python2" is not in PATH or produced an error
gyp ERR! find Python checking if "python3" can be used
gyp ERR! find Python - "python3" is not in PATH or produced an error
gyp ERR! find Python
Oops, it failed because of Couchbase SDK requirements. The node:alpine image does not contain Python, make, or g++ packages required for compiling Couchbase SDK. To solve this problem, we can manually add these packages to our Dockerfile.
$ docker images | grep apiapi latest 836d38527adc 14 seconds ago 903MB$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop) CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c yarn build : 195kB
/bin/sh -c #(nop) COPY dir:84d26c512953ced47… : 4.58MB
/bin/sh -c yarn --frozen-lockfile : 606MB
/bin/sh -c #(nop) COPY multi:009be4c25183643… : 354kB
/bin/sh -c #(nop) WORKDIR /usr/src/app . : 0B
/bin/sh -c apk update && apk add python make… : 206MB
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c apk add --no-cache --virtual .bui… : 5.35MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 . : 0B
/bin/sh -c addgroup -g 1000 node && addu… : 74.2MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 . : 0B
/bin/sh -c #(nop) CMD ["/bin/sh"] : 0B
/bin/sh -c #(nop) ADD file:36fdc8cb08228a870… : 5.59MB
It's better than before but has yet to be satisfying. There are still two big layers that exist in our latest image:
- Python, make, g++ layer (we recently added for Couchbase SDK)
- Node dependencies layer
2) Use Multi-Stage Docker Builds
Multi-stage builds made it easy to optimize Docker images by using multiple intermediate images in a single Dockerfile. Detailed information can be found here. Using multi-stage builds, we can install all dependencies in the build image and copy them to the runtime image. In this way, the runtime image will not contain “Python”, “make”, and “g++” dependencies.
$ docker images | grep apiapi latest 369b27345377 8 minutes ago 375MB$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" api/bin/sh -c #(nop) CMD ["node" "./dist/Main.… : 0B
/bin/sh -c #(nop) EXPOSE 3030 : 0B
/bin/sh -c ls : 0B
/bin/sh -c #(nop) COPY dir:513b34f89adcbc3a6… : 630B
/bin/sh -c #(nop) COPY dir:2809a54b360d448e9… : 289MB
/bin/sh -c #(nop) COPY dir:52d84e1dbbcb0afee… : 215kB
/bin/sh -c #(nop) WORKDIR /usr/src/app : 0B
/bin/sh -c #(nop) CMD ["node"] : 0B
/bin/sh -c #(nop) ENTRYPOINT ["docker-entry… : 0B
/bin/sh -c #(nop) COPY file:238737301d473041… : 116B
/bin/sh -c apk add --no-cache --virtual .bui… : 5.35MB
/bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 . : 0B
/bin/sh -c addgroup -g 1000 node && addu… : 74.2MB
/bin/sh -c #(nop) ENV NODE_VERSION=12.14.0 . : 0B
/bin/sh -c #(nop) CMD ["/bin/sh"] : 0B
/bin/sh -c #(nop) ADD file:36fdc8cb08228a870… : 5.59MB
375MB? But we were expecting only 206MB improvement because of eliminating the “Python - make - g++” layer. Let’s inspect the previous image with the dive tool to find out the reason for the unexpected improvement.
We found that our node_modules folder is only 289MB, but the previous image also includes the 311MB of the yarn cache folder. By using multi-stage builds, we also eliminated these cache folders.
At the end of step 2, we have a 375MB image. This is because we have shrunk the initial image by %315 so far.
3) Shrink Dependencies
Before shrinking the dependencies, we must organize them in the package.json file. Most of the time, there are two options for dependencies.
- dependency: Packages that are required for runtime. Such as express, mongoose, body-parser, nest-js, sql-connectors.
- devDependency: Packages that are required for development time. Such as test frameworks, typing files, and eslint packages.
We don’t need the development dependencies in our runtime image.
3.1) Install Production Dependencies Only
Installing only production dependencies is a way of eliminating development dependencies.
$ yarn install --production=true
But if you want to;
- run static code analyzers,
- run test suites,
- compile Typescript files
you need to install the development dependencies, too. In such cases, this solution won’t work.
3.2) Remove Development Dependencies Manually
The second option is to remove development dependencies manually.
$ npm prune --production
This command removes all packages specified in the devDependencies section. You can find more details here.
And the result:
$ docker images | grep apiapi latest f1adbbffb504 13 seconds ago 145MB
Only 155MB. We can reduce it much more.
3.3) Use Node Prune Tool
The node-prune is an open-source tool for removing unnecessary files from the node_modules folder. Maintainers may forget to ignore test files, markdown files, typing files, and *.map files in Npm packages. By using node-prune, we can safely delete them.
$ docker images | grep apiapi latest 33d086802e50 22 seconds ago 127MB
It reduces image size by 18MB. Better than nothing.
3.4) Delete Remaining Dependencies Manually
After the node-prune step, there may be several significant unnecessary dependencies remaining in node_modules. You can manually detect them by using the following command described here.
$ du -sh ./node_modules/* | sort -nr | grep '\dM.*'11M ./node_modules/rxjs
8.7M ./node_modules/swagger-ui-dist
7.2M ./node_modules/couchbase
The output depends on your dependencies and it is not guaranteed that listed dependencies are unnecessary. Here is our final Dockerfile.
$ docker images | grep apiapi latest 3eed30c8606b 13 seconds ago 118MB
Conclusion
By applying three simple steps, we reduced our docker image size by 13 times.
I strongly recommend running unit tests and performance tests between these steps. Because;
- Chancing the base image may slow down your application because of the lack of some operating-system-level packages.
- By mistake, some runtime dependencies may be specified as development-time dependencies.
- Dependency files you think unnecessary may be required by another dependency you don’t know.