Pablo Ramos
A bit off
Published in
5 min readSep 9, 2021

--

Tips and Tricks for Memory Profiling your Python Code

Memory Profiling in Python

If you have to debug an error in your code can be a hard or fun task to tackle. To figure out why your program/script/tool is consuming such an amount of memory can be tricky. While parsing a few network captures I bumped with a few options working in different ways. I wanted to figure out a few difference between functions. Here I share a few tests I did, and how to easily do some memory profiling in Python.

Writing a tool to parse network captures in Python with Scapy I bumped into a few alternatives. Dealing with a performance issue for big files I’ve decided to compare a few alternatives. This is related to the way rdpcap() and PcapReader() work.

Some of the differences were highlighted in a previous post. Looking for different alternatives I found a pretty cool thread in StackOverflow with a few options to try. The one I choose is Memory Profiler, it has the key featured I needed.

  • Easy to integrate
  • Easy to run
  • Plots

Tutorial within the GitHub project is pretty good and the functionality was pretty handy for what I needed to do. For me, it was as simple as to add the @profile decorator to the scripts functions (here and here).

Code looks like this:

Memory Profiler can be run as a python module or via the mprof command and will generate a file with all the timestamps. This is quite handy and easy to execute.

First we need to execute mprof run <script> to generate a file with all the timestamps and then we can plot this with mprof plot

To run this tool with a Pcap file I downloaded from Malware Traffic Analysis was as simple as this:

mprof run -o sample_run_rdpcap.dat pcap_parsing_rdpcap.py --pcap pcaps/2021-06-04-part-02-after-reboot-Qakbot-with-Cobalt-Strike-and-spambot-activity.pcap --output urls_testing_rdpcap.txt
mprof: Sampling memory every 0.1s
running new process
running as a Python program...
Filename: pcap_parsing_rdpcap.py
Line # Mem usage Increment Occurences Line Contents
============================================================
21 100.0 MiB 100.0 MiB 1 @profile
22 def parse_pcap(pcap_path, urls_file):
23 441.6 MiB 341.6 MiB 1 pcap_flow = rdpcap(pcap_path)
24 442.7 MiB 1.2 MiB 1 sessions = pcap_flow.sessions()
25 442.7 MiB 0.0 MiB 1 urls_output = open(urls_file, "wb")
26 442.7 MiB 0.0 MiB 1775 for session in sessions:
27 442.7 MiB 0.0 MiB 49194 for packet in sessions[session]:
28 442.7 MiB 0.0 MiB 47420 try:
29 442.7 MiB 0.0 MiB 47420 if packet[TCP].dport == 80:
30 payload = bytes(packet[TCP].payload)
31 url = get_url_from_payload(payload)
32 urls_output.write(url.encode())
33 442.7 MiB 0.0 MiB 1300 except Exception as e:
34 442.7 MiB 0.0 MiB 1300 pass
35 442.7 MiB 0.0 MiB 1 urls_output.close()

File sample_run_rdpcap.dat contains the timestamps and the memory consumption sampled every 1 second. Now we can plot this and save it into an image:

Memory Profiler — rdpcap

Now we have a good idea of how much memory usage of rdpcap can take when it reads a file. In this particular case the pcap file had a size of 28MB. Once loaded into memory it consumed ~450 MB and the script executed for more than 200 seconds.

With PcapReader() we get different results, with a lower memory consumption and the script execution time to be considerably lower:

Memory Profiler — PcapReader

Finally, we can compare both executions and plot them within a same graph, which a nice way to compare how the scripts behave and try to decide which one is more effective for the purpose I was after:

Memory Profiler — Multiple Plots

Differences between the scripts are kind of self explanatory. For example, rdpcap loads and parses the whole file in memory which gives more information and capabilities at the time to read a Network Capture. Nevertheless, pcapreader reads the file sequentially, which is faster, but less flexible.

The graph for rdpcap we can see the memory consumption goes up and then reaches a plateau. Once the whole file is read, just then it will process packets.

Out of curiosity I ran the scripts for multiple pcap files and tried to see how these could be identified. Graph looks like this:

Memory Profiler — Multiple Pcaps

All samples parsed with rdpcap have a different memory consumption and they have a slope at the start when it loads the file. PcapReader memory consumption peaks and remains constant around 100 MB.

What’s the point here?

Test, test and test. Sometimes it’s useful to test, probe and figure out what are the best ways to write a piece of code for a tool or a purpose. I had never tried any memory profiling in Python before and this is solely to scratch the surface.

Nevertheless, to have options to test, compare and decide what’s the best approach to solve a problem its key when trying to figure out how something works. I’m not done with the script/tool I’m building, and definitely have a few more questions about profiling in python.

This simple way to test test things out can be helpful for someone else. There are a few more tricks which can be implemented and are useful for many more use cases in Python.

--

--

Pablo Ramos
A bit off

Infosec Researcher, traveller, kitesurfing enthusiast. I just like to think outloud