Memory Profiling in Python

If you have to debug an error in your code can be a hard or fun task to tackle. To figure out why your program/script/tool is consuming such an amount of memory can be tricky. While parsing a few network captures I bumped with a few options working in different ways. I wanted to figure out a few difference between functions. Here I share a few tests I did, and how to easily do some memory profiling in Python.

Writing a tool to parse network captures in Python with Scapy I bumped into a few alternatives. Dealing with a performance issue for big files I’ve decided to compare a few alternatives. This is related to the way rdpcap() and PcapReader() work.

Scapy: Ways of reading pcapsRead a Network Capture Files might be a common task to extract information, samples or other network traffic…
medium.com

Some of the differences were highlighted in a previous post. Looking for different alternatives I found a pretty cool thread in StackOverflow with a few options to try. The one I choose is Memory Profiler, it has the key featured I needed.

Easy to integrate
Easy to run
Plots

Tutorial within the GitHub project is pretty good and the functionality was pretty handy for what I needed to do. For me, it was as simple as to add the @profile decorator to the scripts functions (here and here).

Code looks like this:

Memory Profiler can be run as a python module or via the mprof command and will generate a file with all the timestamps. This is quite handy and easy to execute.

First we need to execute mprof run <script> to generate a file with all the timestamps and then we can plot this with mprof plot

To run this tool with a Pcap file I downloaded from Malware Traffic Analysis was as simple as this:

mprof run -o sample_run_rdpcap.dat pcap_parsing_rdpcap.py --pcap pcaps/2021-06-04-part-02-after-reboot-Qakbot-with-Cobalt-Strike-and-spambot-activity.pcap --output urls_testing_rdpcap.txt
mprof: Sampling memory every 0.1s
running new process
running as a Python program...
Filename: pcap_parsing_rdpcap.pyLine #    Mem usage    Increment  Occurences   Line Contents
============================================================
    21    100.0 MiB    100.0 MiB           1   @profile
    22                                         def parse_pcap(pcap_path, urls_file):
    23    441.6 MiB    341.6 MiB           1       pcap_flow = rdpcap(pcap_path)
    24    442.7 MiB      1.2 MiB           1       sessions = pcap_flow.sessions()
    25    442.7 MiB      0.0 MiB           1       urls_output = open(urls_file, "wb")
    26    442.7 MiB      0.0 MiB        1775       for session in sessions:
    27    442.7 MiB      0.0 MiB       49194           for packet in sessions[session]:
    28    442.7 MiB      0.0 MiB       47420               try:
    29    442.7 MiB      0.0 MiB       47420                   if packet[TCP].dport == 80:
    30                                                             payload = bytes(packet[TCP].payload)
    31                                                             url = get_url_from_payload(payload)
    32                                                             urls_output.write(url.encode())
    33    442.7 MiB      0.0 MiB        1300               except Exception as e:
    34    442.7 MiB      0.0 MiB        1300                   pass
    35    442.7 MiB      0.0 MiB           1       urls_output.close()

File sample_run_rdpcap.dat contains the timestamps and the memory consumption sampled every 1 second. Now we can plot this and save it into an image:

Now we have a good idea of how much memory usage of rdpcap can take when it reads a file. In this particular case the pcap file had a size of 28MB. Once loaded into memory it consumed ~450 MB and the script executed for more than 200 seconds.

With PcapReader() we get different results, with a lower memory consumption and the script execution time to be considerably lower:

Finally, we can compare both executions and plot them within a same graph, which a nice way to compare how the scripts behave and try to decide which one is more effective for the purpose I was after:

Differences between the scripts are kind of self explanatory. For example, rdpcap loads and parses the whole file in memory which gives more information and capabilities at the time to read a Network Capture. Nevertheless, pcapreader reads the file sequentially, which is faster, but less flexible.

The graph for rdpcap we can see the memory consumption goes up and then reaches a plateau. Once the whole file is read, just then it will process packets.

Out of curiosity I ran the scripts for multiple pcap files and tried to see how these could be identified. Graph looks like this:

All samples parsed with rdpcap have a different memory consumption and they have a slope at the start when it loads the file. PcapReader memory consumption peaks and remains constant around 100 MB.

What’s the point here?

Test, test and test. Sometimes it’s useful to test, probe and figure out what are the best ways to write a piece of code for a tool or a purpose. I had never tried any memory profiling in Python before and this is solely to scratch the surface.

Nevertheless, to have options to test, compare and decide what’s the best approach to solve a problem its key when trying to figure out how something works. I’m not done with the script/tool I’m building, and definitely have a few more questions about profiling in python.

This simple way to test test things out can be helpful for someone else. There are a few more tricks which can be implemented and are useful for many more use cases in Python.

Memory Profiling in Python

Scapy: Ways of reading pcaps

Read a Network Capture Files might be a common task to extract information, samples or other network traffic…

What’s the point here?

Written by Pablo Ramos