Marriage of Docker and CodeChain

Seon Pyo Kim
CodeChain
Published in
6 min readMay 3, 2019

Why Docker?

Image by Bo-Yi Wu from Flicker

The Docker creates an image that contains binary and runtime environment information for the specific application and makes it possible for applications to run in an isolated environment called a ‘container’ when the image is run. Although CodeChain does not distribute the Windows version, you can still run the node through the Windows version of the Docker. There are many functional similarities to existing ‘virtual machines’. However, ‘virtual machines’ are methods that virtualize a complete computer including the operating system, making it cost a heavy toll for performance and is disadvantageous in terms of capacity and distribution management when running only a single application.

The advantage of a docker is that it can distribute images at a much lower capacity, and the performance gap between the host and the container is minimal. You can also easily manage versions of your images through the Docker hub, similarly to managing versions of your code through Github.

Why is there a commit hash error only in the image of the Docker?

There is a command called commitHash among the remote procedure call (RPC) types in JSON format currently supported by CodeChain. This command can be used to get the commit hash in the currently running CodeChain binary’s git storage. However, unlike binaries that were built and run directly from local repositories, binaries distributed via Docker image could not retrieve commit hashes from the remote procedure call.

The cause was simpler than expected, and the problem was registering the .git directory within the .dockerignore file to exclude it from the Docker build. It is very customary for developers to register IDE-related directories such as .idea and .vscode in .gitignore. It may be a matter of course to exclude the .git from the build of the Docker, since it contains the edit history and could cause security issues. CodeChain used a library called vergen, which was introduced in this article, in order to handle commit hash remote procedure calls. Since the library requires .git’s information at the time of the build to register the commit hash, the commit hash call obviously would not work properly in a Docker image that was built without the necessary information.

Troubleshooting with multistage builds

The same issue has been raised in the go-ethereum project (Issue 15346). But is it possible to include .git’s information in a build without revealing it to external sources? To accomplish this, you can utilize Docker’s multi-stage build. One of the features is that you can specify the step of the build and copy and use only the information you need to go from one step to the next. Thus, by using a two-step build, .dockerignore does not register the .git directory, but in the build of the second stage, it does not copy the contents of this initially contained directory so that .git’s information can be used during build time, but not included for distribution. The completed Docker images built from the multi-staged Dockerfile can be found in the CodeChain Docker hub.

How can we maintain data persistence?

CodeChain depends on the locally stored database and keys. However, when an application is launched from within a container through a Docker image, data is stored on the container’s layer, and the environment is isolated from the host that is holding the container. If you get an image of a new CodeChain version and run the container, or even if you run the same container with the same image, the two data that CodeChain relies on mentioned earlier will not persist. The data in the writable layer of the container is tightly coupled to the host machine, which makes it difficult to export the container’s data.

Mounting space on a host

Docker provides three ways to mount the space of a host within space within a container: Volumes, Bind mounts, and tmpfs mounts. Since tmpfs mounts only stores in the host’s memory, it never writes to the host’s file system and does not have persistent information on the disk, and is unfit for the purpose of maintaining data persistence on CodeChain. Bind mounts can be mounted anywhere on the host system in the container. It may be a system file or directory of the host, and it can be modified at any time by a process, whether that process is a Docker process or not. While it is outstanding in performance, one side effect is that processes running within the container can potentially delete or modify important files or directories on the host system. Thus, CodeChain recommends that you use Volumes to manage your data persistently.

Utilizing Volumes

The Volumes option mounts the space within /var/lib/docker/volumes, which is a host file system managed by the Docker, above the container. Volume options can be specified in two major ways: before build and when running the container. If you specify before the build, there is no information about the running host, so you can not build the Volume by specifying the name of the Volume in advance. The name of the Volume is often used for maintenance, such as connecting or backing up data in a new CodeChain container. The Volume names assigned to the Docker file before the build are built using hashes, making the name longer and the accessibility harder. Therefore, CodeChain’s Docker image does not specify a volume before deployment. The Volume option can be specified at the time the container is run, similarly to -v host_dir_path:container_dir_path, where the specified host directory path( host_dir_path) is located under /var/lib/docker/volumes and the path in the container ( container_dir_path) indicates an isolated path within the container. With this option, the host directory is mounted above the directory in the container.

By recording the database and keys on the host’s file system, when a new version is released or when you run a new container, it is possible to keep the data and run CodeChain by mounting it on the existing space. More specifically, if codechain-db-vol and codechain-keys-vol were present on the Volumes of this Docker and you execute the code shown below, the existing data will be loaded, otherwise a new Volume will be created.

$ docker run -it -v codechain-db-vol:/app/codechain/db -v codechain-keys-vol:/app/codechain/keys codechain-io/codechain:branch_or_tag_name

The reason for specifying the path in the container is that the default working directory is set to /app/codechain in the build configuration of the Docker images, and the underlying db and keys directories are used by default. You can customize CodeChain’s execution options below.

1. Utilizing base-path option

Among the command-line execution options in CodeChain, there is an option to specify the base-path, which allows you to write the database and keys to a Volume as follows:

$ docker run -it -v codechain-data-vol:custom_base_path codechain-io/codechain:branch_or_tag_name — base-path custom_base_path

2. Utilizing db-path and key-path

Alternatively, db-path and keys-path can be specified separately and written to different Volumes:

$ docker run -it -v codechain-db-vol:custom_db_path -v codechain-keys-vol:custom_keys_path codechain-io/codechain:branch_or_tag_name — db-path custom_db_path — keys-path custom_keys_path

After specifying the Volume, you can check the currently registered Volumes by using docker volume ls as shown below:

$ docker volume lsDRIVER VOLUME NAMElocal codechain-db-vollocal codechain-keys-vol

You can also check the details of each Volume with docker volume inspect as shown below:

docker volume inspect codechain-db-vol[{“CreatedAt”: “2019–04–29T14:48:41+09:00”,“Driver”: “local”,“Labels”: null,“Mountpoint”: “/var/lib/docker/volumes/codechain-db-vol/_data”,“Name”: “codechain-db-vol”,“Options”: null,“Scope”: “local”}]

Currently, it is easy to run CodeChain via Docker image obtainable through the Docker Hub. If you follow the instructions in the README.md in the CodeChain store, anyone will be able to operate a CodeChain node using the image of the Docker.

--

--