Testing docker CVE scanners. Part 1: false negatives and what they mean for your security

10 min readApr 9, 2020

There are several scanners I could not test, please if you have access to Twistlock, Nexus IQ, Black Duck or anything not included: ping me!

It all started with picking the right CVE scanner. Given that most CI pipelines for application developers give you a Docker image, the dream is you can address both checking issues in software dependencies and the underlying OS in one go. Added bonus: it helps a lot with vulnerability management. You can just get all the images from your running environment scan them and get an instant insight into how the system is doing in terms of patching possibly a signal if you need to react to something immediately.

No surprise that a number of players are moving in to the part of scanning Docker images for known vulnerabilities as part of CI/CD: from software dependency scanning vendors such as Snyk, WhiteSource and Sonatype, security operations vendors as Twistlock and Aqua, CI tool vendors for example JFrog with Xray. There are a number of open source tools as well, like Anchore, Trivy and Clair. They all provide functionality to scan your Docker images, but in all fairness most products have this a secondary feature.

There is a lot that goes into picking a tool but what I found the hardest to gauge is the effectiveness in actually finding the CVEs.

Whenever you would run any of these tools against your images (be it something vulnerable or ones that you use) you can trust to get tens if not hundreds of CVEs. Is that all? Are all those relevant? Does it depend in the way of installation?
This research is about false negatives and false positives.

Strategy based on CVEs — Assumptions for your vulnerability management/patching

If your patching and vulnerability management is guided by the output of CVE scanners you should have a pretty good picture of the following question:

Do no open CVEs mean you are doing good?
If I want to have a CI/CD quality gate, is it enough to have the requirement: no High severity CVEs in the image?
Should I apply patches and rotate even if there are no important CVEs shown by the tool?
Should I look out for certain patterns when building Docker images to make sure it is picked up by the tool?
When the next branded vulnerability comes out am I covered just by looking at the output or do I need to do some more detailed digging?

What goes into an image

Dockerfiles are cool since they give a complete repeatable recipe to create an image: it has all the information of what goes into it. The assumption is that tools scanning images will give a complete view of all vulnerabilities of everything that is present at the image at the end. Those things would be:

OS packages/libraries; things about the kernel but common even utilities like ssh
Dependencies of software you develop yourself; think of things you add in a package.json, gemfile or a pom.xml
and every other component/program you install; talking about tomcat, django, mongodb or postgres

I separated this final category on purpose because I’m not sure if they belong to the OS even though you might install them through a OS package manager.

I would argue that the most important category for us when it comes to vulnerability management is “everything else”, the web frameworks, database servers, web servers, applications. Usually these are the ones that are exploited in mass, as exploits work reliably on different environments.

Previously you were trying to catch different vulnerabilities with some diagnostic tool: for OS you would look for patch levels, some software dependency scanners for things you build and hope to catch the rest with a vulnerability scanner or some agent that enumerates things running in the system. Hope/expectation is that you could do this in one shot for the image, similarly how a software dependency scanning works on your build description.

Most of the scanners are not forthcoming about which of the categories they could or aim to tackle. Clair is one of the exceptions, it claims to only look for issues in OS packages. However there is no clear definition what that means. Is Apache or Jenkins an OS package? For Snyk you can also find some description that they check OS package manifest files and search for some key binaries. In this case the question is what you can expect for the key binaries.

To give you an example when you pull sonatype/nexus3:3.14.0 that has a big fat remote code execution, CVE-2019–7238 (all versions up to and including 3.14), you could conceivably assume the tools should give you a High risk issue for it. (!SPOILER ALERT! None of them do)

Let’s take a deeper look

I have tested Anchore, Aqua, Clair, Snyk and Xray as I was only able to get trial licenses for those . Hoping that with your help I can test further tools, I’m currently not disclosing everything about the images I used for testing to make sure all the tools are compared on equal footing. I will include a full description of images in Part 3.

I tested on 73 images: all with actually exploitable and important CVEs (remote code execution, directory traversal, command execution etc) mostly based on debian, centos, ubuntu and alpine. Some of the issues are in library or OS package, a bunch of them in web servers, web frameworks, queues, databases and some in applications’ dependencies gathered using pip, npm, gem, maven etc. Installations are done through package managers, building from source, downloading and installing or just copying the executables, basically stuff that you would expect people to do.

Question is: do they find the CVEs I’ve been looking for?

*In the meantime I have received some responses from vendors I’ll include them at the end of this article

The first row tells you in how many of the 73 test images did the scanner find the CVE I was looking for. The second row tells you how many were not labeled by the scanner as high/critical. In this case I only counted RCE, directory traversal, command execution etc (there were a few XSS, open redirect etc that are not clearly high severity). Obviously this should not be used for comparison due to the different base. The third row gives you an idea, to what extent could you even expect to find something: if no CVEs where found at all there is clearly something wrong with my setup, for example the distribution is not supported. The last row tells you that there is a large number of CVEs found in the images on average and that different scanners will have different opinions; higher is not necessarily better in this case.

I tried to find setups that are not obvious blindspots for scanners (Clair again is exception) and indeed all scanners find CVEs in the vast majority of images and there is very few cases where they would find none.

For Anchore and Snyk you could specify a Dockerfile on top of the image when you scan; unexpectedly to me this did not seem to make a difference.

Granted the average number of vulnerabilities is going to be a bit higher for the test images than normally as most images will include quite some outdated things. Nonetheless this also gives you an idea how big is the haystack in which you are looking for the needle.

What have we learned?!

1, You should not assume that CVE scanners will catch all important CVEs or in some cases that they even give you relevant amount of information if you are only scanning Docker images. In Part 2 I’ll do more structured analysis of what you could do to get better results and more precisely understand what you can realistically expect.
2, Whenever you are building software yourself and can directly scan a dependency file you probably should. It was not straightforward to include numbers on this but whenever I scanned a dependency file used while building the image, Snyk would always find the vulnerability (these were not added to the results). I would expect software dependency scanners to have much less false negatives for reasons described later. I reckon this could be a way to improve accuracy when installing packages with non-OS package managers: moving the list of dependencies to file instead of just literally listing them in the Dockerfile. If your npm, pip etc global dependencies are listed that way you could and probably should scan them with a software dependency scanner.
3, If your boss asks: Are we at risk because of this new fancy vulnerability that is actively being exploited? If it is important enough you should probably do some further investigation and not merely rely on the output of a CVE scanner giving information about your containers as it misses some things you installed.
4, For important enough containers (e.g. internet facing or privileged ones) you should probably not just rely on scanning the image. You should probably validate if you scanner picks up all components, or do some other scanning. Running some classic tool or maybe maintaining the list of these components and pulling dependencies directly through the API of vulners.com or from snyk.io/vuln. I’m currently not aware of any tool that would create and query such a list.
5, Do not assume fixing high severity vulnerabilities is enough, if security is a priority for you try to make sure you make it easy for yourself to roll updates so you can do it often.

Are these false negatives relevant to you?

I do believe false negatives do tell something important and inherent about the scanners and hopefully helps you to pick the right tool for your environment and the strategy to go with that.

However, there are some limitations to how much these results reflect your environment or if pure Docker image scanning false negative rate is a leading factor in your tool selection:

The images, installation methods, distributions do not correspond to the distribution of actual Docker images and especially your images. This should not be expected to be the actual false negative rate. But I would assume the tendencies are right and you should get a picture of what each vendor is targeting. Most of the target CVEs are in the category I called everything else. I would argue, that is where most exploits will also happen in the wild so having focus on that is not necessarily misguided. On the other hand, one could also argue that debian as a distribution (which makes up the vast majority of the 73 images) makes finding the vulnerabilities harder due to its size and complexity, diversity of installation methods.
Probably you would want to do dependency scanning earlier in your lifecycle (in the IDE, in a git hook, during integration…), not just once you have built your Docker containers, in which case you would be using some more specific software dependency scanner functionalities. Not to mention that there are completely different strategies for vulnerability management.
Whenever you pick a tool your choice comes down to a multitude of different factors. Snyk has nifty integration giving you pull requests for fixes, WhiteSource aims to identify if the way you are using a dependency makes you actually vulnerable, you probably will pick Aqua and Twistlock more for the security operations capabilities and not merely for CVE scanning. Add to that, the fact that Xray and Sonatype IQ is integrated with the registry/repository they provide. Even if we look at just Docker image scanning: some tools will require a service to maintain; others an agent to be installed in your image; there is going to be a difference in the time of scanning which might limit CI integrations…

Nonetheless I think it is important to have a clear picture of how good your measurements are and where do you have your blind spots.

To get a better picture of what to expect from each scanner Part 2 will focus on a systematic test of each tool against different installation methods.

Coming up

Part 2 is about the analysis of different scanners dealing with specifically crafted images. Answering questions like:

Do tools find CVEs if you curl/wget/add/copy packages or build components directly from source?
What happens if you do harden the image and for example delete binaries?

Part 3 will be about the images I used for testing and the final results.

HELP!!

Please contact me if there is any other scanner that you have access to for which you want to know the results!

Thanks to the support from Snyk, Clair and especially to Anchore and WhiteSource for actively helping out with research.

UPDATE: Vendor Response

WhiteSource: the engineers went to great lengths to validate some of the results (I’d like to thank them especially for this). They found: in 4 cases the scanner did identify the CVE, only it was not included in the summary report that I based the research on. This happened due to a UI bug in how the summery report is created. According to the vendor this does not affect most users as most importantly the integrations are based on the raw data and not the report. There were additionally 8 cases where the issues were not in an open source component (which is the target of the product); hence, they would not consider those as false negatives. I have to also highlight that WhiteSource as a product focuses on removing false positives by for example checking the actual code for the patches. In that sense focusing on open seems a relevant decision. That makes their score (arguably by a liberal interpretation): 20/65