Native command-line experience for containerized tools

Published in

OUSPG

7 min readOct 27, 2020

Container handling in the Port of Oulu (Courtesy of Port of Oulu)

TL;DR

Installation and security problems of command-line tools can be mitigated with containers. However, that introduces a new problem of how to arrange access to input and output files. Our cincan-command provides a solution while preserving the native command-line experience as far as possible.

What are containerized tools?

A container is a packaged software ready to be executed inside a sandbox. The software does not need to be installed, just downloaded, provided that you have the containerization system itself installed, e.g. Docker (https://www.docker.com/). A containerized program is also isolated from the host system, which provides some security against unwanted side effects.

You can run different kinds of programs in a container. In this blog post, we talk about command-line tools run inside containers. These tools can serve any purpose, but in our research group, we focus on tools used for security analysis, incident response, and incident investigation. Command-line tools are popular in the security community, as there are a lot of different open source ones, which can be used for various tasks. Some of the well-known tools are Wireshark, Volatility, Nikto, and Ghidra. Sure enough, most of these tools do have a GUI, but many prefer to use them from the command-line, especially when automating tasks by scripting.

But as said, this blog post is not limited to security tools, the following is applicable for any command-line tool which reads input as files and writes results to files.

Reasons to use containers

Installing a set of tools can be difficult, as tools have conflicting installation requirements, such as different Python versions, different Java versions, different versions of the used 3rd party libraries, etc. It may not be easy to compile the tools, and downloading pre-built binaries from a shady site is not a good option. Some security-sensitive users may also worry that a tool attempts somehow to compromise the running environment. For these reasons, one may want to run a tool in a container.

You can build the container yourself, or use a pre-built container image. In the CinCan project, we have packaged a set of security-related tools into containers for your convenience (https://gitlab.com/CinCan/tools).

Tool input and output

For best isolation, all files outside the container are off-limits for the containerized tool. This is a show stopper with a command-line tool, which needs to access input files and/or to produce output files. There is a set of solutions to the input/output file problems. These approaches and their pros and cons are listed below:

Add files to the container in building phase

Pro: Safe, as input files are well defined
Con: Not a run-time solution, input must be known at container build time, and output files remain in the container

Copy input files to the container before running the tool, and copy output files from the container after the tool is completed

Pro: Safe, as input and output files are explicitly defined
Con: Needs extra code to do the copying. May be slow if a large number of files and/or a lot of files

Create a volume which can be accessed by the container

Pro: A safe solution and a volume can be shared by many containers
Con: Does not solve the underlying problem, provided the files must be still transferred between the host and the volume

Mount host directories into the container

Pro: Relatively easy to access the files
Con: Mounting is an extra step. Exposes host file system to the container, which opens up concerns with access rights, confidential files, and unauthorized modifications

Pipe input to the standard input of the container and read results from the standard output of the container

Pro: Safe, as input and output is well defined
Con: Only works when standard input/output is a sufficient way to interact with the tool

Cincan command-line experience

In the CinCan project, we created the command cincan to run containerized tools without the need to explicitly transfer files to or from the container (https://pypi.org/project/cincan-command/). We call this native command-line experience. Our chosen approach is to 1) copy input files into the container, 2) run the tool inside the container, and 3) copy output files out from the container.

To illustrate the approach, let’s run a simple one-liner script as our “command”, first natively without a container. This one-liner calculates MD5 sum of a file. The input file is input.txt. The created result file is result.md5. The command-line is the following:

$ sh -c “md5sum input.txt > result.md5”

We can check that the command is doing what is supposed to do like this:

$ cat result.md5
746308829575e17c3331bbcb00c0898b input.txt

In the following we use docker image busybox, which is directly available in the Docker container repository.

Just running the command with docker does not work, as the input file is not in the container.

$ docker run busybox sh -c “md5sum input.txt > result.md5”
md5sum: can’t open ‘input.txt’: No such file or directory

Mounting the work directory into container directory /data solves the problem, but extra parameters required and lets the tool access host file system

$ docker run -v $(pwd):/data busybox sh -c “md5sum /data/input.txt > /data/result.md5”

Cincan-command does not need any mounts to run the command successfully. It copies the input.txtinto the container before the tool runs and copies result out from there afterward.

$ cincan run busybox sh -c “md5sum input.txt > result.md5”
busybox: <= input.txt
busybox: => result.md5

As shown, the command prints the files it copies into the container and the result files it copies out. This printout can be disabled with the option --quiet.

But how does the cincan-command know which files to move?

Detecting input and output files

The cincan-command is using the following heuristics to detect which files must be moved into the container:

Break down the command-line into a list of potential input file names
Check which potential files are real files in the host
Copy the files into the container

For the example command, the heuristics try the following candidate input file names. Just one of them, file named input.txt, is found:

sh -c “md5sum input.txt > result.md5”  | File not found
sh                                     | File not found
-c                                     | File not found
md5sum input.txt > result.md5          | File not found
md5sum                                 | File not found
input.txt                              | FOUND
result.md5                             | File not found
>                                      | File not found

The actual tool is then executed inside the container. After that, the result files to move out from the container must be detected. The heuristics for it is the following:

Use docker API to get the list of modified files in the container (just in working directory, explained later)
Check which files are missing or different on the host
Copy the files to the host

Note that for container security, the output files are never copied outside the current working directory of the host. This is to make sure the containerized tool cannot overwrite system files, eg. in /usr (well, unless you run in /usr or in / directory which is not advised).

When the detection fails

Often the heuristics we described works great, and you can enjoy the native command-line experience using the cincan-command for running containerized tools. However, there are cases when the heuristics fail.

Input files are not detected properly when their names do not appear explicitly in the command-line. Thus, the input files are not copied into the container and the command fails.

Secondly, the logic does not know if a file in the command-line is for input or output. So, it copies output files from a previous tool run into the container unnecessarily. Eg. if you run the example command the second time, you get the following:

$ cincan run busybox sh -c “md5sum input.txt > result.md5”
busybox: <= input.txt
busybox: <= result.md5
busybox: => result.md5

For large files and/or when there are a lot of files, the file copying may become slow. Some tools also create many irrelevant intermediate files, which get copied out from the container. Eg. a set of python .pyc files may be generated when a tool runs.

Additional options to specify input/output

When the input or output file detection fails, or when you absolutely want to be sure that only the correct input and output files are copied, you can use any of the following cincan-command features:

You can use options--in-filter and--out-filter to filter the files to transfer in and out from the container
You can use option--mkdir to indicate a result directory which is output-only and should not be copied into the container
You can place special file .cincanignore into the container to specify which generated files should not be copied out as result files
Finally, you can use options--in and--out to explicitly give input files in a tar-archive, and get output files in a tar-archive

These options, and many others, are documented with cincan-command documentation. You can get short help with the option--help.

And remember, if all else fails, you can still use the native docker-command and mount the host directories into the container.

Container recommendations

Cincan-command does not have any strict requirements for the containers or the containerized tools. However, as the tool inside the container is executed by the container default user and in the default working directory, we do recommend defining a non-root user (USER) and a working directory (WORKDIR) in the container build script.

The problem is that often the container user is root with the working directory defaulting to root /. With this, the tool can easily read or overwrite any file in the container, either by mistake or intentionally. Even with access limited within the container, this seems risky as container system files may get overwritten and some system files may be interpreted as tool input. And most annoying, the cincan-command may believe some of the updated system files are results and copy them out from the container and into the host. As we said earlier, cincan-command never writes results outside the host working directory, so the danger of the copied container files messing up the host system should be nevertheless small.

For these reasons, we recommend that a well-behaving container should have a non-root user and a dedicated working directory. For example user ‘appuser’ with home at /home/appuser.

Get started

Cincan-command is available in Python package index (https://pypi.org/project/cincan-command/). Cincan-command should work with most containerized command-line tools. See Cincan-command help for more information.

The is a set of containerized security tools packaged by the CinCan project for your convenience (https://gitlab.com/CinCan/tools)