DeepCode up to 54 times faster than comparable tools

Published in

DeepCodeAI

7 min readMay 11, 2020

TL;DR

Static Code Analysis is an invaluable tool during the software development process. To make the most of it, it needs to be right at the fingertips of the developer and deliver suggestions in a reasonable time on a reasonable machine. DeepCode’s mission is to make AI-based analysis available to every developer and we understood from the start, that performance both from a speed as well as a resource consumption perspective is of high importance.

In this article, we share the first batch of results from a comparison between DeepCode and typical other static code analysis tools with the focus on JavaScript. We will add other languages and update the data over time. We also try to be as transparent as possible so you can rerun these benchmarks yourself and compare the results. Let us know if we forgot a technical detail or any process step you need to replay. Finally, we strive to be fair and not to discriminate against any participant. Obviously, due to license or shipping format, we see limitations that we state below.

Overall, the results show the performance advantage of DeepCode with factors up to 54 times faster than comparable tools. Below is a boxplot on a logarithmic scale showing the throughput in lines of code over the test field in comparison to the solutions reviewed. This brings DeepCode in the unique position to serve developers with highly relevant and actionable suggestions to improve their code quality in plugins in their work-environment — the Integrated Development Environment or IDE. The power of AI in real-time.

Boxplot (logarithmic scale) showing first and third percentile and whisker (n=49 repos with around 3.8 million LoC)

Background and Results

DeepCode provides a static code analysis system that is based on Symbolic-AI. DeepCode uses vast amounts of open source repositories and their change-history. For example, DeepCode used 200,000 repositories to train for the C++ programming language with several million lines of code multiplied by their change histories. Here the need for performance comes into play: Due to the proprietary engine, DeepCode is capable of crunching these large amounts of data, processing, refining, and learning rules of changes found in the repos.

As an example, DeepCode’s engine can identify sources of external and maybe malicious data by following the data flow within applications by comparing to other applications. On top, DeepCode uses sources like documentation to learn. It is a basic rule in machine learning, that more and better data for training and test, result in better models but also more effort in training. Here, the performance of the engine is key.

Secondly, performance is important when analyzing the code of developers. We were told that some developers are unable to use static code analysis on a regular basis as the runtime is 8 hours and more given legacy codebases of several million lines of code. This is obviously not acceptable, not even mentioning the fact that the suggestions of a static analysis should be available to the developer during development time. If the results are available later — during code review or even during testing — the effort to mitigate is much higher.

DeepCode is proud of its unique engine that enables us to serve developers in real-time, right at their fingertips. Huge effort and years of research resulted in this highly optimized engine. One part is deeply embedded in the engine and the way it reflects and traverses the complex graphs used to represent source code. Another part is the unique rule solver based on Datalog used to apply rules of the model on source code graphs.

In combination, this results in a performance advantage making DeepCode orders of magnitude faster than alternative offers while providing a unique quality in the results. On top, as DeepCode uses symbolic AI it can reason over its findings and, by using examples taken from the training set, help to grasp the issue at hand and find a remedy quickly. This is necessary as some of the more complex bugs need to follow the application flow over several function calls and hundreds of lines of code. A major strength of the static analysis is to find these not so obvious bugs.

So far, for you, we only claimed we have a record-breaking engine. Obviously, we did our internal tests and comparisons, but we were very cautious to have a valid foundation before we publish data. We are now ready to start publishing data. We start with JavaScript and will extend over time with more tools, more languages, and regular data updates.

We also have a deep academic background and want to follow the scientific principle and fidelity of data reported here: We want to enable interested parties to redo what we did. Therefore, we strive to provide all the necessary details to rebuild and re-measure what we did. We also want to be fair and not discriminate against tools by engineering a setup that brings them in disadvantage. The goal was to provide a realistic and equal testbed. We are also aware of limitations that result from different forms of licensing or cloud-based vs on-premise. We will report those next to the results.

Comparison of Analysis Times (s) (logarithmic scale) over Projects

The results are more than promising. We see advantages for DeepCode in the realm of two to forty-four times faster analysis by DeepCode. In the mean, DeepCode achieves 2,201 lines per second (in comparison to 736 or 182). The box plot below shows the comparison (note the logarithmic scale). We are aware that throughput actually depends on the complexity of the code analyzed, but over millions of lines and various repository sizes, the average stabilizes and provides an idea of general performance. With this, we calculated the “DC Factor”, how many times DeepCode is faster than the other product. In the median, DeepCode is between 4 to 19.7 times faster than comparable products.

Boxplot (logarithmic scale) showing first and third percentile and whisker (n=49 repos with around 3.8 million LoC)

DeepCode Factor — Relation between runtimes (logarithmic scale)

As far as the generated results are concerned, SonarQube and LGTM produced more alerts than DeepCode. However, many of those alerts turned out to be noisy style warnings (lint like results), while DeepCode detected more semantically complex and relevant issues that no other tools in the industry can detect. We will publish an in-depth report on that topic as well in the coming weeks. Subscribe to …. Twitter or Linkedin to get notified. Over the next 2 months our team will be focusing on JavaScript: making DeepCode the most advanced and undisputed (in all categories) analyzer for JavaScript. Start using it today to experience the benefits and the progress first-hand.

Test Setup and Details

The test run was done on the 5/7/2020. We used a lower-level machine to simulate a loaded work environment for our test infrastructure. It has a quad-core Intel(R) Core(TM) i5–3450 CPU (3.10GHz), 8 GB of RAM, and SATA HDD.

The first test field:

SonarQube, Community edition, Only local repos
LGTM, Community-based on Semmle/GitHub
DeepCode, Free to use for open-source/teams up to 30, commercial (free for all during COVID-19 ), Used local repos

For the repository selection, we took a random sample from hundreds of thousands of GitHub repositories and limited the result set to those repositories which are available for use in LGTM — which came out at about a third of the candidate repositories. Furthermore, we limited the selection to repositories for which DeepCode primarily detected JavaScipt issues. This resulted in a set of repositories ranging from a few 100 to over 700,000 lines of code (LOC), and nearly 80,000 lines on average.

Measurement excluded repository checkout time to reduce dependencies on network speed, although we did include the DeepCode cli code upload time, which put it at a slight disadvantage — particularly for small repositories. We restarted the SonarQube on-premise server after each analysis because we experienced crashes due to “too many open files” and wanted to make the tests independent of the sequence of analysis. We also excluded the startup-time of the server and only measured true analysis time.

Outlook and What To Expect

We will continue to extend the data. We will add more languages and therefore more repositories. We will also add more tools and rerun the measure routinely. This will also be our publishing vehicle — we will continue to push data on this blog here.

We hope we were able to trigger your interest to learn more about DeepCode. Give us a try — we are free to use for open source and for teams below 30 developers. And for a limited time we are free for anyone in support of everyone coping with and recovering from the COVID-19 crisis.

Raw Data

(CSV File here) — Notes below

Notes:

loc — Lines of Code
Time in seconds — Pure analysis time (lowest number in field in green)
Throughput in Lines per Second — loc / (Time in seconds)
DC Factor — The relation between DeepCode time versus other tool
Sample size is n=49 with 3.8 million loc, with repos between 376 and 753,575 loc, JavaScript