Python Memory Management: Reference Counting, Garbage Collection, and Circular References

Paritosh Sharma Ghimire
Nerd For Tech
Published in
7 min readJul 11, 2024
Picture Credit: Analytical Vidhya

Introduction

Python’s memory management is a crucial aspect of the language that often goes unnoticed, especially by beginners. In this article, we’ll explore how Python manages memory, focusing on reference counting and garbage collection. We’ll use practical code examples to demonstrate these concepts and explain how Python handles memory allocation and deallocation.

Understanding Reference Counting

Python uses reference counting as its primary memory management technique. Each object in Python has an associated reference count, representing the number of references pointing to that object. When this count drops to zero, Python automatically frees the memory occupied by the object.

Let’s start by looking at two methods to check reference counts:

  1. Using sys.getrefcount():
import sys

my_list = [1, 2, 3]
print(sys.getrefcount(my_list)) # Typically prints 2

This method is safe and easy to use but always returns a value one higher than expected because passing the object to getrefcount() creates a temporary reference.

2. Using ctypes for direct memory access:

import ctypes

def get_ref_count(address):
return ctypes.c_long.from_address(address).value

my_list = [1, 2, 3]
print(get_ref_count(id(my_list))) # Typically prints 1

This lower-level approach directly accesses the memory where the reference count is stored. It’s more accurate for our demonstration but requires careful use due to direct memory access.

Setting Up the Environment

Let’s set up our environment for the main demonstration:

import sys
import ctypes
import gc

# Disable garbage collection for demonstration purposes
gc.disable()

def get_ref_count(address):
"""
Get the reference count of an object at a given memory address.

Args:
address (int): Memory address of the object.

Returns:
int: Reference count of the object.
"""

return ctypes.c_long.from_address(address).value

We import necessary modules, disable garbage collection (to isolate reference counting behavior), and define our get_ref_count function.

Creating Classes with Circular References

A circular reference occurs when two or more objects reference each other in a way that creates a closed loop. This situation can prevent Python’s reference counting system from automatically freeing memory, potentially leading to memory leaks.

Let’s examine our example in detail:

class A:
"""
Class A with a circular reference to an instance of class B.
"""
def __init__(self):

self.b = B(self)
print(f"A: {hex(id(self))}, b:{hex(id(self.b))}")

class B:
"""
Class B with a circular reference to an instance of class A.
"""
def __init__(self, obj):
self.a = obj
print(f"B: {hex(id(self))}, A:{hex(id(self.a))}")

The Creation of a Circular Reference

  1. Instance Creation:
    When we execute my_var = A(), Python creates an instance of class A.
  2. A’ s Initialization:
  • Inside A’ s __init__ method, it creates an instance of B.
  • It passes self (the current A instance) to B' s constructor.
  • The newly created B instance is stored as self.b.

3. B’s Initialization:

  • B’ s __init__ method receives the A instance as obj.
  • It stores this A instance as self.a.

4. The Circular Link:

  • Now, the A instance has a reference to the B instance (via self.b).
  • The B instance has a reference to the A instance (via self.a).
  • This creates a circular reference: A → B → A

Reference Count Analysis

Let’s analyze the reference counts at this point:

id_a = id(my_var)
id_b = id(my_var.b)

print(get_ref_count(id_a)) # Expected: 2
print(get_ref_count(id_b)) # Expected: 2
  1. For the A instance:
  • One reference from my_var
  • One reference from B’s self.a
    Total: 2 references

2. For the B instance:

  • One reference from A’s self.b
    Total: 1 references

The Problem Emerges

The issue becomes apparent when we remove the external reference:

my_var = None
print(get_ref_count(id_a)) # Expected: 1
print(get_ref_count(id_b)) # Expected: 1

After setting my_var to None:

  1. The A instance loses one reference (from my_var), but still has one from B.
  2. The B instance maintains its reference from A.

Both objects now have a reference count of 1, but they’re only referencing each other.

This creates a situation where:

  • The objects are still in memory.
  • They’re inaccessible from the rest of the program.
  • The reference counting system can’t detect that they’re no longer needed.

Why Reference Counting Fails Here

Reference counting works by tracking how many references point to an object. It assumes that when the count reaches zero, the object is no longer needed.

In a circular reference:

  1. Each object in the loop maintains at least one reference.
  2. The reference count never reaches zero.
  3. The memory is never automatically freed.

Checking Object Existence

Let’s define a function to check if objects still exist in memory:

def get_object_status_by_id(obj_id):
"""
Check if an object with the given id still exists in memory.

Args:
obj_id (int): The id of the object to check.

Returns:
str: 'Object exists' if the object is found, 'Object does not exist' otherwise.
"""
if obj_id in [id(obj) for obj in gc.get_objects()]:
return 'Object exists'
return 'Object does not exist'

print(get_object_status_by_id(id_a)) # Expected: Object exists
print(get_object_status_by_id(id_b)) # Expected: Object exists

This function checks if an object with a given id is still in memory. We can see that both objects still exist, even though we can’t access them through my_var anymore.

The Role of Garbage Collection

This is where Python’s garbage collector becomes crucial.

The garbage collector:

  1. Periodically scans all objects in memory.
  2. Builds a graph of object references.
  3. Identifies groups of objects that reference each other but aren’t accessible from the root set (i.e., active parts of the program).
  4. Removes these inaccessible object groups from memory.

In our case, it detects that the A and B instances form a closed loop that’s not reachable from any active part of the program and removes them.

gc.collect()

print(get_object_status_by_id(id_a)) # Expected: Object does not exist
print(get_object_status_by_id(id_b)) # Expected: Object does not exist

Garbage Collection: Automatic by Default

It’s essential to highlight that in normal Python operations, garbage collection occurs automatically. The reason it didn’t happen automatically in our example is that we deliberately disabled it for demonstration purposes. Let’s clarify this:

Why We Disabled Garbage Collection

The reason for disabling garbage collection in our example was to isolate and demonstrate the behavior of reference counting and the specific issue of circular references. By turning off the automatic garbage collection:

  1. We could observe how objects with circular references persist in memory even when they become unreachable.
  2. We could demonstrate the limitation of the reference counting system in handling circular references.
  3. We could then show the explicit effect of running the garbage collector on these otherwise uncollectable objects.

In Practice

In your day-to-day Python programming:

  1. You don’t need to disable garbage collection.
  2. You rarely, if ever, need to call gc.collect() manually.
  3. Python’s automatic memory management, combining reference counting and periodic garbage collection, efficiently handles most memory management tasks, including the cleanup of circular references.

Conclusion: The Power of Python’s Automated Memory Management

Throughout this article, we’ve delved into the intricacies of Python’s memory management, exploring reference counting, circular references, and garbage collection.

However, it’s crucial to emphasize why, in most cases, Python developers don’t need to actively manage memory:

  1. Abstraction and Simplicity: Python’s design philosophy emphasizes simplicity and readability. By automating memory management, Python allows developers to focus on solving problems rather than dealing with low-level memory details.
  2. Efficiency: Python’s combination of reference counting and garbage collection is highly efficient for most applications. It quickly frees memory when objects are no longer needed and periodically cleans up more complex scenarios like circular references.
  3. Error Prevention: Manual memory management is prone to errors such as memory leaks, double frees, and use-after-free bugs. Python’s automated system significantly reduces these risks.
  4. Cross-Platform Consistency: Automated memory management ensures consistent behavior across different platforms and Python implementations.
  5. Scalability: From small scripts to large applications, Python’s memory management scales well without requiring changes to how developers write code.
  6. Focus on Higher-Level Concepts: By abstracting memory management, Python encourages developers to think in terms of higher-level concepts and design patterns, leading to more maintainable and robust code.

While understanding these underlying mechanisms is valuable, especially for optimizing performance-critical applications or debugging complex memory issues, it’s important to remember that Python’s strength lies in its ability to handle these details automatically.

This automation is a key factor in Python’s popularity and productivity benefits. In practice, Python developers should trust the language’s memory management system and only intervene in exceptional cases. By doing so, they can write cleaner, more efficient code and focus on creating value through their applications rather than getting bogged down in memory management details.

The knowledge we’ve explored in this article serves not as a guide for daily coding practices, but as a foundation for understanding Python’s inner workings.

This understanding can be invaluable when you need to optimize performance, debug memory-related issues, or work on systems with limited resources.

However, for the vast majority of Python development tasks, letting Python handle memory management automatically is not just sufficient — it’s optimal.

Thank you.

Did you find this deep dive into Python’s memory management insightful? If you’re eager to explore more about Python’s inner workings, data engineering, and data science, your engagement means a lot:

👏 Clap for this article if you found it valuable

💬 Share your thoughts or questions in the comments

➡️ Follow me for more in-depth contents

Your support keeps me motivated to create more detailed explorations of complex topics. Let’s continue unraveling the intricacies of technology together! What aspect of Python or data science should we demystify next? Let me know in the comments!

--

--

Paritosh Sharma Ghimire
Nerd For Tech

Data Engineer | Expert in Statistical Analysis & Hypothesis Testing | Aspiring Data Scientist, Student of Data Science @ The Open University