Testing Docker CVE Scanners. Part 2: How good is package detection?

5 min readApr 21, 2020

There are several scanners I could not test, please if you have access to Twistlock, Nexus IQ, Black Duck or anything not included: ping me!

It is pretty hard to get useful information about the efficacy of CVE scanners. Vendors are not exactly specific about what you could expect and the common answer “The CVE was not listed, as the vulnerability is not exploitable in your case” sounds reasonable, reassuring but it is costly to validate. I decided to do this research to get more data, to understand for what purpose and under which circumstances could I recommend to use such a product.

In Part 1 we scanned exploitable images to see if the tools find the related CVEs. The low detection rates were surprising for me so I decided to take a deeper look. There is a lot that goes into a false negative: is the package even detected? Is the right version recognised? Is the CVE debated or otherwise unclear if it applies to a version? Does the tool mistakenly remove the CVE as a false positive?

In Part 2 I decided to test if CVE scanners even detect the presence of packages if they are not installed via OS package managers. Likely this makes the most difference for the false negatives from Part 1

From a security perspective I’m a huge fan of image hardening. You reduce the attack surface by removing from an image everything you don’t need. So I also looked at false positive detection: for each image I created a version where I deleted the package. Additionally, I also took a normal debian image and used a hardening script on it that deleted most binaries. I added this case to make sure I don’t only have cases when I delete something that I specifically installed in an earlier step, a test that isn’t entirely realistic.

Test images

I created several test images with an outdated package added using the OS package manager. I used these as benchmark, as all the scanners detected the package. Then for the test I added variations and built the same image in different ways: Building the package from source, building from source using docker multistage builds or downloading a executable. I added one case where I used a different package manager (pip).

I picked components that you would not run through a specific software dependency scanner. Additionally I focused on packages that, in case they have a critical CVE, are usual targets for exploitation:

nginx, tomcat, haproxy, gunicorn, redis, ruby, node

I have to admit that the selection probably made it harder for the scanners. In my experience, they mostly look at the file system of the image. Looking at a C++ executable tells much less than what you can find out from a jar file.

I tried to pick existing Dockerfiles from popular repositories to make the examples more realistic. Same images are debian others are alpine based.

You can check the test cases on github or directly pull the images from docker hub.

Detecting detection

Most tools wouldn’t give you an inventory of the packages they found, unless they are installed with a package manager. So I checked detection by first picking a CVE that the tool found in the package in the benchmark image. Then I looked if it identified the same CVE if I installed the package in a different way.

When counting the scores I did not include all test case as most of them did not seem to make a difference.

WhiteSource did identify tomcat for all cases. However, it gave different list of CVEs for the exact same version. I did not subtract points for this, as I was looking purely at detection rates.

I included 2 cases for java/tomcat (source and binary) 2 cases for python/gunicorn (pip, source) and 1 each for everything else (redis, nginx, haproxy, nodejs, ruby). From the latter group only Xray and WhiteSource detected a single instance from the 6 cases. For the tomcat and gunicorn example Anchore, Xray and WhiteSource detected some of the cases. Clair, Trivy and Snyk did not identify any of the examples.

Only Anchore seems to take into account deletion of files. Even for Anchore the results are not clear due to the low detection rate. It did identify the deletion of tomcat and gunicorn installed via pip. I have to add that Snyk didn’t even manage to scan the image once I ran the hardening script.

The last row is an example for the number of unique CVEs detected by each scanner in one benchmark (gunicorn installed through apt). This aims to illustrate that in almost all cases there was a disagreement between scanners. A higher number does not necessarily mean better coverage! Some scanners try to get rid of false positives to help you looking at only relevant issues. However, the difference clearly shows that the tools have significantly different opinions.

Take-home

If you depend on the results of docker image CVE scanners try to install everything important with the OS package manager. This is especially important if you are using Trivy, Clair, Snyk.
If you are running Anchore, Xray or WhiteSource you might get better results, but I would still be cautious for important packages. For languages that do not have an explicit dependency file, I would recommend double checking if the package is detected correctly. Currently I only found option for this in WhiteSource.
If you go YOLO and recklessly decide to not use the OS package manager, there is some good news. The method of installation does not make a difference. Multistage builds does not seem to make detection less likely. there was only 1 case where it resulted in false negative compared to building in the same image.There does not seem to be any difference between curl/wget/ADD/COPY. In cases where I had specifically downloaded a tar archive I experimented with creating a layer with the archive saved. I was hoping that this might result in better identification but there was no noticeable change. Results for alpine and debian do not show significant difference either. I reckon, these results are less firm since I picked only a handful of test cases to start with and the tools only detected anything in a fraction of those.

Yes you can help!!

Please contact me if there is any other scanner that you have access to for which you want to know the results!

Thanks to the support from Snyk, Clair and especially to Anchore and WhiteSource for actively helping out with research.

Vendor response

During earlier discussions WhiteSource already signalled that the tool does not detect deletion,. However, they are currently working on it.