Analytics Vidhya
Published in

Analytics Vidhya

Memory Management in Python

Photo by Hu Chen on Unsplash

Memory has been an elusive topic for me except I know there were some terms such as ‘pass by value’ , ‘pass by reference’, ‘destructor’ , ‘segfault’ going around in my C++ class which I spectacularly failed. When retook the class after a year, this time offered in Java, did fine and bumped up the crazy GPA. ‘Memory Leak` was again one of the term that I got around in a later class and sounded ominous. Before that I had come across ‘gas leak’, ‘water leak’; how bad this ‘memory leak’ could be?

Here we’ll learn about variables, memory address, reference counting, memory leak, and garbage collection in Python. tracemalloc for reaching the end of the article.

Variables in Python is a reference to memory address. What is a memory address then? Well, memory address are a like a unique location to keep objects.

Schematic diagram of objects in a heap with labelled memory address

Let check this variable assignment:

our_var = 12

Here, the variable our_var is actually referencing to memory address in heap which keeps objects.

schematic diagram showing a variable referencing a memory address

We can use id() method which takes variable name to get base 10 memory address, pass by reference. To get the memory address in hexadecimal, use hex() method which takes value from id() as a parameter, pass by value.

In Python, we cannot directly access memory. However, we have ctypes module to get value from memory address.

Here’s an example:

On running the file, we get this:

id = 140684875197072
hex id: = 0x7ff3c002ea90
Value from address 140684875197072 is 12

Caution: Make sure while using ctypes.cast the address you are giving is base 10. Try giving hexadecimal value, you’ll get bus error .

To keep track of references to an object, we have a mechanism called reference counting. It keeps tab of how many variables are referencing to a same memory address.

In Python, we can get reference count by using two modules sys and ctype . ctype has 1 value less than sys because sys takes variable name as an input adding that 1 more value while ctype directly takes memory address.

Can you tell which one is pass by reference and which one is pass by value?

Let’s see an example:

On running the file, we get:

our_var object reference count using sys module: 4
our_var object reference count using ctypes module:3
🌼 Adding variable `my_var` reference to `our_var` object 🌼our_var object reference count using sys module: 5
our_var object reference count using ctypes module: 4
🌺 Deleting variable `my_var` reference to `our_var` object 🌺our_var object reference count using sys module: 4
our_var object reference count using ctypes module: 3

You see that our initial reference count for our_var object was 4 and 3 using sys and ctypes module respectively. On adding my_var object that reference to the same memory address of our_var , we increased reference count by 1. On deleting the my_var object we get back to original reference count.

Why do the original reference count using sys module gives 4 instead of 2? The 2 references meaning first is referenced by variable our_var and another is while passing our_var for printing. Good question! That’s because Python’s compiler might have some other variable referencing to that same String object.

Now let’s come to garbage collection.

Garbage collection’s job is to reclaim memory address after reference count hits 0.

Python has a module called gc that gives us to inspect or configure it. Let’s use get_objects() method to get length of objects listed by it.

Let’s see an example:

On schematically looking at the class:

Instance of class Serval and Cat

On running the file, we get:

BEFORE: 5199REFERENCE COUNT
servy's reference count: 1
whiky's reference count: 1
AFTER CREATION: 5218AFTER DELETION: 5219

I was bit surprised to see just 1 increment (5219–5218) of objects in garbage collector list after deleting two objects — servy and whiky. My assumption is that since garbage collector is a daemon thread, it’ll find the other one in a laissez faire manner. Or it could be there’s some optimization in the compiler. Experiment it if you want. In this case reference count of servy and whiky reaches 0 on deleting since their initial reference count was 1.

However, that’s not the case all the time. We can have times where there’re circular references.

Take this as an example:

On schematically looking at the classes:

Circular reference of between two classes Serval and Cat

On running the file, we get

servy's refcount: 2
whiky's refcount: 2
AFTER DELETION OF servy
whiky's refcount: 2

Do you see something awry? After deleting servy instance that whiky is pointing , whiky's reference count is still 2. This is called a circular reference which leads to memory leak!

Since we wrote this code purposefully to demonstrate it, we can get out of this situation by assigning whiky' s cousin to None.

whiky.cousin = None

This makes reference count of whiky to be 1 after which we can make it’s reference to 0 by deleting the whiky instance.

In large codebase, figuring out where memory leak is happening could be a tedious task. Luckily, we have a memory leak debugging tool called tracemalloc.

You can ask how many stack frames to save that we want to look at, take snapshots of interested area of code, and compare. We even have an attribute lineno that’ll tell us which line number in a codebase is using a lot of resources. Therefore, helping us narrow search space and ultimately hit our target for a fix.

For python > 3.4, garbage collector has capability to clean objects that is causing memory leaks.

In case you want to see me tinkering around, here is a codebase with tracemalloc module:

The output looks something like this:

servy's refcount: 2
whiky's refcount: 3
whiky's refcount after deleting grace: 2
whiky's refcount after deleting circular reference: 1
servy's refcount after deleting circular reference: 1
.../medium/memory_management/circular_reference_2.py:19: size=424 B (+424 B), count=1 (+1), average=424 B
.../medium/memory_management/circular_reference_2.py:18: size=424 B (+424 B), count=1 (+1), average=424 B
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/tracemalloc.py:423: size=88 B (+88 B), count=2 (+2), average=44 B

Hope it was a helpful article and ready to dive into deeper layer of high level code transformation!

Congratulations and thank you for reading! I’ll be out soon with my next article. 🐛

Reference:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store