Capturing Core Dumps of Crashing Processes using Sysinternals ProcDump for Linux
Sysinternals ProcDump for Linux is a versatile tool for monitoring processes and creating diagnostics data by allowing the user to set up triggers that generate core dumps when activated. Examples of triggers include CPU, memory, thread, file descriptor, signals, .NET integration and much more.
Recently, I was approached by the Microsoft Browser Support team here at Microsoft. They were trying to understand how they could generate core dumps for Microsoft Edge when the browser was misbehaving. They looked at ProcDump for Linux as a possible answer to the question — “How can we generate core dumps of Microsoft Edge on Linux when it crashes?”
We collaborated with the Microsoft Browser Support team to make sure they were able to use ProcDump for Linux to diagnose Edge problems and as part of that collaboration we added a couple of improvements that we feel will be super useful to everyone wanting to get core dumps for their crashing applications. These improvements were released in ProcDump 3.1 for Linux (https://github.com/Sysinternals/ProcDump-for-Linux/blob/master/INSTALL.md).
Crashing processes
Windows engineers typically think of an application that unexpectedly crashes as having suffered from an unhandled exception. On Linux, things look a little different and uses the concept of signals. Conceptually speaking, a signal is delivered to a process when something of interest has occurred. Examples of signals are SIGINT
which is delivered to a process when the user hits CTRL-C
or SIGSEGV
when the program has accessed a restricted area of memory.
It stands to reason that in order to generate a core dump when a process crashes, we have to determine which signal(s) that we consider to be crashing behaviors. As mentioned earlier, a SIGSEGV
is typically considered a crashing behavior where we would like a core dump generated. Another example is SIGABRT
that is typically sent when a program experiences an unhandled exception. Fortunately, ProcDump for Linux has had the capability of generating core dumps when a specific signal is encountered using the -sig
switch. For example, if we wanted to generate a core dump when a SIGSEGV
occurs we can issue the following command line:
$ procdump -sig 11 <pid>
11
is the numerical representation of SIGSEGV
. You can find a list of signals and their corresponding numerical representation by running:
$ kill -l
This capability is great when you know which signal is causing the crash, but what happens when you don’t know which signal to target? This was the dilemma that faced the Microsoft Browser Support team and led us to the addition of being able to specify multiple (comma separated) signals. For example, a helpful list of signals include SIGSEGV
and SIGABRT
:
$ procdump -sig 11,6 <pid>
The above command line generates a dump when either a SIGSEGV
or SIGABRT
is encountered.
Controlling the size of the core dump
Conceptually, a core dump is a static snapshot of the memory contents of a given process. It can contain a number of different categories of memory such as anonymous private mappings, file-backed private mappings, huge pages and more. Most of the time, the size of the generated core dump will be manageable but there are processes that use a lot of memory. As a result, including everything in the core dump can make them prohibitively large. For example, an application that simply creates an anonymous mapping of 10GB and never uses any of it leads to a core dump of 10GB+. With an excessively large core dump, there may not be enough room to store it and/or transferring the core dump from a production machine can be too expensive.
This was the exact scenario that the Microsoft Browser Support team found themselves in when troubleshooting Microsoft Edge. When Microsoft Edge crashed, the resulting core dump was very large and made it challenging to accurately manage it. As a result, in order to address this issue, ProcDump now includes the -mc
switch which allows you to specify the type of memory to include in the dump. The -mc
switch takes a hexadecimal number representing a bitmask of the different memory categories. The current list of options are shown below but you can also get the most up-to-date by running man core
.
bit 0 Dump anonymous private mappings.
bit 1 Dump anonymous shared mappings.
bit 2 Dump file-backed private mappings.
bit 3 Dump file-backed shared mappings.
bit 4 (since Linux 2.6.24) Dump ELF headers.
bit 5 (since Linux 2.6.28) Dump private huge pages.
bit 6 (since Linux 2.6.28) Dump shared huge pages.
bit 7 (since Linux 4.4) Dump private DAX pages.
bit 8 (since Linux 4.4) Dump shared DAX pages.
For example, if we wanted to include only anonymous mappings (both private and shared) and ELF headers, we would end up with the following bitmask:
000010011 (hexadecimal 13)
The ProcDump command line would now look like:
$ sudo procdump -sig 11,6 -mc 13 <pid>
This resuling core dump will be smaller than the default setting. Of course, one caveat to keep in mind is that anytime you remove information from a core dump you also remove the ability to troubleshoot a problem that relies on that diagnostics data being present.
Tip: Microsoft Edge will create multiple processes when browsing the web. In order to know which tab belongs to which process you can use:
…->More tools->Browser task manager
This will open up a new window which lists the different tasks with their corresponding process ID.
Thank You!
This was a fun collaboration between the Microsoft Sysinternals team and Microsoft Support that led to some cool new features that helped them diagnose issues in Edge for Linux.
We’re super excited about these new features and hope that you are as well! We’d love to get your feedback on new feature requests or bugs.
Simply head to our GitHub page — https://github.com/Sysinternals/ProcDump-for-Linux.