How we were looking for a bug in PVS-Studio or 278 GB of log files

Published in

PVS-Studio

9 min readNov 3, 2022

Original: pvs-studio.com

Here is an interesting story of how our team were looking for a bug in the PVS-Studio analyzer. Well, we make mistakes too. However, we are ready to roll up our sleeves and dive deep into the rabbit hole.

A small backstory

A colleague of ours has already talked about our technical support. But it’s always interesting to read some tech support stories. We do have them!

If you want heavy programming stuff, you can skip straight to the next section. If you want to get to know the inner workings of our support team, keep reading :).

At the moment we have five development departments:

the C and C++ analyzer development department;
the C# analyzer development department;
Tools & DevOps department;
web development department;
CRM development department.

The first two departments, as their names suggest, are engaged in the development and support of corresponding static code analyzers. This includes:

development of the analyzer’s kernel: enhancements of the parser and the type system, improvements of data flow analysis and symbolic calculations, etc. By the way, we recently wrote several articles about various refinements: intermodular analysis, fighting legacy code in the C and C++ analyzer, improving data flow analysis in the C# analyzer;
writing new diagnostic rules and improving old ones.

The third department develops and supports all additional software for our analyzers:

integration with popular IDEs — Visual Studio 2010–2022, IntelliJ IDEA, Rider, CLion;
integration with the SonarQube continuous quality control platform;
integration with the Unreal Engine and Unity game engines;
the utility for converting the analyzer report into various formats — SARIF, TeamCity, HTML, FullHtml, etc.;
the utility that notifies the development teams about suspicious fragments in code.

In addition to development, all departments are also engaged in technical support. Every month, we select one or two people from each department to communicate with users via email. Please note: these people do not sit in any call centers and are not engaged in the primary processing of requests. For this, we have another department with great experience. They manage to keep guys from the development department away from most typical user questions, except those that are technically complex. Actually, this is the moment when we — developers — start cooperating with support to solve such complex tasks. In most cases, such tasks will require fixes in code. We believe that this approach not only improves the quality and speed of tech support, but also demonstrates to developers the importance and relevance of the functionality they have developed.

Now let’s take a closer look at the support of the C++ department. Support requests for C and C++ analyzer can be divided into the following types:

The diagnostic rule issues a false positive. The developer is very lucky if the user sends the code example to reproduce the issue. In most cases, examples sent by email are greatly simplified, and correcting a diagnostic sometimes may become an ordeal.
The analyzer does not issue a warning on the user’s code example. Here two outcomes are possible:

the analyzer does not issue a warning on purpose. Here you can learn more about the reasons why it does so in some cases;

the analyzer does not issue a warning on purpose. Here you can learn more about the reasons why it does so in some cases;

the user is correct. We get the necessary clarification on the example from the user, and then we decide: either we fine-tune the existing diagnostic, or we write a new one.
The analyzer did not understand some construction of C and C++ languages. The grammar of these languages allow you to write super complicated code, and sometimes the analyzer cannot handle it. In such situations, users send us the V001 errors. To fix such problems, we usually request code examples that can be reproduced or intermediate files for analysis (*.i and *.cfg files).
The crash of the C and C++ analyzer’s kernel. No one is safe from making mistakes, crashes sometimes happen. And it happens to our analyzer too (V003). Our users help a lot by sending us a stack trace, a memory dump or intermediate files for analysis.
One of the many product use cases does not work. The problems of this kind are extremely varied, and it is not possible to describe them all in a sentence or two.

The story mentioned in the title of the article began just with the user’s request to the support. The client complained about the freezing of the incremental analysis, so next we will talk about the latter item of the list above.

Incremental analysis that failed

The story began with the user contacting our support with the following issue:

they run the analysis in incremental mode or run a check of the file list;
they parallel the analysis in N threads;
the analyzer works perfectly well up to a certain time in N threads and then “collapses” to a single thread. At the same time, a bunch of V008 errors saying about the inability to preprocess the file begin to pour into the report.

The first action to take in this situation is to look at the log file. After looking at the analyzer’s log sent by the user, we found a lot of lines of the following kind:

Command "/usr/bin/c++ -DBOOST_ASIO_DYN_LINK ...." returned code 3.

This line means that the preprocessor stopped working due to a timeout. We run the preprocessor on the compiled project files in order to expand macros and make substitutions of files specified in the #include directives. And only after that we run the analysis on the received files with some additional information (target platform, paths to excluded directories from the analysis, etc.).

Many C++ developers are familiar with the pain of compiling projects with included Boost libraries — the build time greatly increases. Preprocessing is affected by this as well. As you can see from the command above, the user uses Boost in the project. Previously, we also received emails with a similar issue: with high CPU usage, files do not have time to preprocess.

We have had an idea for some time to remove the 30-second preprocessing timeout. And then we got the similar case. It has been decided — we removed the timeout. We send the user a beta and were waiting for a response.

We were about to forget about the fixed bug when the user reported back about the beta:

previously the analysis made it to the end, but there was a pile of V008s in the report;
now the analysis freezes at the parsing stage of the same files (about 86% of the progress).

What parsing of the files are we talking about?

Well, the problem turned out to be more complicated. We continued to dig deeper.

Since the preprocessor crash is gone and now apparently it was the C and C++ analyzer’s kernel that was freezing, we decided to look at the generated configuration files. And it seems that this is exactly what we needed. There was nothing unusual in the client’s settings, except for one small detail:

exclude-path=*/generated/sip*
exclude-path=*/pacs/soapserver/generated/*
exclude-path=*/soap_engine/*
exclude-path=*/tech1utils/tests/googlemock/*
exclude-path=*/sdk-common/*
exclude-path=*/tech1grabbers/SDKs/*
# ....
# 200+ similar entries
# ....
exclude-path=/mnt/nvme/jenkins/workspace/..../lpr-ide.cpp

The exclude-path setting allows to suppress warnings on code from third-party libraries and tests. In a standard situation, users either specify multiple paths to specific directories, or use a search pattern. And the number of entries rarely exceeds 30–40. In our case, there were 200+ different paths with excluded files, including search patterns. We suspected that our algorithm for excluding files from analysis, written 10+ years ago, simply could not quickly handle such a number of entries in the configuration file.

Why is it slowing down?

After optimizing the algorithm on a test case with 200+ excluded paths in the configuration file, the analyzer started parsing and analysing files several times faster. It was definitely a success. It remained the case for small — to build a beta, give it to the user and rejoice at our small victory.

The butler did it!

But it was too early to celebrate the victory (close the ticket). The user again wrote about the same freezing.

Well, quick fixes didn’t help, we needed to dig even deeper into this issue. We decided to ask the user to run our utility on strace and send us all the logs generated. If anyone does not know, the strace utility allows you to track all system calls of the program and much more. By the way, we use it as one of the options for running the analyzer on the project (the compilation tracing).

Here is the command that the user used to generate logs:

strace -y -v -s 4096 -ff -o strace-logs/log.txt -- pvs-studio-analyzer ....

They left the program running for about 20 minutes before terminating the process. Since during the freeze, the strace utility continued to write information to logs, the size of logs turned out to be impressive — 22795 files with a total weight of 278 GB (!) without compression.

First, we looked at the strace results. And immediately we saw a huge number of nanosleep calls. This meant that the child processes generated by the pvs-studio-analyzer utility, for some reason, were pending. We went through the logs from top to bottom and found the problem (the image is clickable):

When you click on the image, the gif will show that the file descriptor number gradually increases after opening the files. After this number approached the value of 1024, the EMFILE error was generated when an attempt was made to allocate a new descriptor, and then the analysis stopped. This behavior indicates a leak of file descriptors.

In Linux OS, when the file is opened, it is assigned a special number — a descriptor. The descriptor is then used to work with the file: read, write, view attributes, etc. The number of such descriptors is limited and is determined by the system settings.

By the way, it is very easy to reproduce the problem. To reproduce the problem, it is enough to write the following CMakeLists.txt:

cmake_minimum_required(VERSION 3.5)
project(many-files LANGUAGES C CXX)set(SRC "")foreach(i RANGE 10000)
  set(file "${CMAKE_CURRENT_BINARY_DIR}/src-${i}.c")
  file(TOUCH "${file}")
  set(SRC "${SRC};${file}")
endforeach()add_library(many-files STATIC
            ${SRC})

Next, we form the cache in the directory with CMakeLists.txt and run the pvs-studio-analyzer utility version 7.18 and earlier:

cmake -S . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=On
pvs-studio-analyzer analyze -f ./build/compile_commands.json -j -i -o pvs.log

Unfortunately, at the time of writing this article, the original logs have sunk into oblivion. So in the picture above is a log that we reproduced ourselves.

Who was guilty?

We fixed the resource handling in the program and the problem disappeared. We suspect that this error has not occurred to anyone before, as the Linux version of the analyzer is more commonly used on build servers in normal mode. Incremental analysis is often used in combination with IDEs, and at the moment we fully support only JetBrains CLion on Linux. It appears that, until then, there was no user with the need to analyze a project in incremental mode with a large number of files.

Giving the beta to the client for the third time, we finally solved the problem with the freeze of the analyzer.

Conclusion

Unfortunately, not all problems coming to support are easy to handle. Often the most trivial bugs lie deep inside and are difficult to debug.

We hope that our story was interesting for you. And if you have any problems with our product, do not hesitate to contact our awesome support. We promise we will help you.

How we were looking for a bug in PVS-Studio or 278 GB of log files

A small backstory

Incremental analysis that failed

The butler did it!

Conclusion

Related articles

Written by Unicorn Developer