How Python’s Memory Management Works

Alexander Obregon
10 min readJul 7, 2024

--

Image Source

Introduction

Python’s memory management is an important part of its performance and efficiency. Understanding how Python handles memory can help developers write better code and optimize their applications. This beginner-friendly article explores Python’s memory management, covering reference counting, garbage collection, and memory allocation strategies.

Reference Counting

What is Reference Counting?

Reference counting is a memory management technique used by Python to keep track of the number of references to each object. When an object’s reference count drops to zero, it means that the object is no longer in use and can be safely deallocated. This method is simple yet effective for many memory management scenarios and forms the backbone of Python’s memory management system.

How Reference Counting Works

In Python, every object has an associated reference count. This count increases whenever a new reference to the object is created and decreases when a reference is deleted. The reference count is an integer value stored in the object’s header. Here is a simple example to show reference counting:

a = [1, 2, 3]  # Reference count of the list object is 1
b = a # Reference count increases to 2
c = a # Reference count increases to 3

del b # Reference count decreases to 2
del c # Reference count decreases to 1
del a # Reference count decreases to 0, and the list object is deallocated

This mechanism makes sure that the memory is reclaimed immediately after it is no longer needed, leading to efficient memory usage.

Reference Counting in Depth

Every Python object includes a reference count as part of its metadata. When you create a new object, its reference count is initialized to one. As you assign this object to different variables or data structures, the reference count increases. Conversely, when you delete these references, the reference count decreases.

For example, consider the following code:

import sys

a = [1, 2, 3]
print(sys.getrefcount(a)) # Outputs 2 because the reference count includes the argument to getrefcount

b = a
print(sys.getrefcount(a)) # Outputs 3

c = a
print(sys.getrefcount(a)) # Outputs 4

del b
print(sys.getrefcount(a)) # Outputs 3

del c
print(sys.getrefcount(a)) # Outputs 2

del a
# The reference count drops to 1, but because sys.getrefcount was called, the object is still in scope within that call

Handling Circular References

One of the main limitations of reference counting is its inability to handle circular references, where two or more objects reference each other, forming a cycle. In such cases, the reference count never drops to zero, leading to memory leaks.

Advantages of Reference Counting

  1. Immediate Reclamation: Memory is freed immediately when the reference count drops to zero, which can lead to lower memory usage and less overhead compared to other garbage collection strategies that may only reclaim memory at certain intervals.
  2. Simplicity: The implementation of reference counting is straightforward, making it easier to understand and debug.
  3. Deterministic Destruction: Objects are destroyed as soon as they are no longer needed, which can be particularly useful for managing resources like file handles or network connections.

Disadvantages of Reference Counting

  1. Cyclic References: As mentioned, reference counting cannot handle cyclic references, which can lead to memory leaks.
  2. Performance Overhead: Incrementing and decrementing reference counts adds overhead to every assignment and deletion operation, which can impact performance, especially in programs with many short-lived objects.
  3. Memory Overhead: Each object must store its reference count, adding to the memory footprint of objects.

Strategies to Mitigate Cyclic References

To mitigate the issues caused by cyclic references, developers can take several approaches:

  • Weak References: Python’s weakref module allows the creation of weak references, which do not increase the reference count of the objects they refer to. This is useful for caching and other applications where circular references might otherwise occur.
import weakref

class Node:
def __init__(self, value):
self.value = value
self.next = None

node1 = Node(1)
node2 = Node(2)
node1.next = weakref.ref(node2)
node2.next = weakref.ref(node1)
  • Manual Breakage: Developers can manually break reference cycles by setting references to None before an object is deleted.
node1.next = None
node2.next = None
del node1
del node2
  • Garbage Collection: Python’s garbage collector complements reference counting by detecting and collecting objects involved in reference cycles. The garbage collector is part of the gc module, which provides the functionality to tune and control garbage collection.
import gc

gc.collect() # Forces garbage collection

Garbage Collection

What is Garbage Collection?

Garbage collection (GC) is a mechanism that complements reference counting to manage memory more effectively. It is used to detect and reclaim memory occupied by objects that are no longer accessible, even if they are part of a reference cycle. This is crucial for preventing memory leaks in applications where circular references might occur.

How Garbage Collection Works

Python uses a cyclic garbage collector to detect and break reference cycles. The garbage collector periodically scans objects in memory, identifies cycles, and removes them. The GC mechanism is part of Python’s gc module, which provides tools to inspect and manipulate the garbage collection process.

The garbage collection process involves several steps:

  1. Generation-based Collection: Python’s garbage collector divides objects into three generations based on their lifespan. Newly created objects are placed in the first generation (young generation). Objects that survive garbage collection cycles are promoted to the second generation (middle generation) and eventually to the third generation (old generation). This approach is based on the empirical observation that most objects die young.
  2. Mark-and-Sweep Algorithm: The garbage collector uses the mark-and-sweep algorithm to identify and collect unreachable objects. During the marking phase, the GC traverses the object graph starting from known root objects (such as global variables and stack frames) and marks all reachable objects. In the sweep phase, it collects all unmarked objects, reclaiming their memory.
  3. Thresholds and Triggers: Garbage collection is triggered based on certain thresholds, which are defined by the number of allocations and deallocations. When these thresholds are exceeded, the garbage collector runs to clean up unused objects. These thresholds can be adjusted using the gc module.

Here’s an example of a reference cycle and how garbage collection deals with it:

class Node:
def __init__(self, value):
self.value = value
self.next = None

node1 = Node(1)
node2 = Node(2)
node1.next = node2
node2.next = node1 # Creates a reference cycle

del node1
del node2 # Both objects are still in memory due to the cycle

import gc
gc.collect() # Forces garbage collection, breaking the cycle and reclaiming memory

In this example, the gc.collect() function is used to manually trigger the garbage collection process, making sure that the cyclic references are detected and cleaned up.

Tuning Garbage Collection

Python’s garbage collector can be fine-tuned to optimize performance and memory usage. The gc module provides several functions to control and inspect the garbage collection process:

  • Enabling and Disabling GC: Garbage collection can be enabled or disabled using the gc.enable() and gc.disable() functions. Disabling GC can be useful in performance-critical sections of code where the overhead of garbage collection is undesirable.
import gc

gc.disable() # Disable garbage collection
# Perform performance-critical operations
gc.enable() # Re-enable garbage collection
  • Adjusting Thresholds: The thresholds that trigger garbage collection can be adjusted using the gc.set_threshold() function. This allows developers to control how frequently garbage collection runs, balancing between memory usage and performance.
import gc

gc.set_threshold(700, 10, 10) # Adjust thresholds for garbage collection
  • Inspecting GC Statistics: The gc module provides functions to inspect the state of the garbage collector, such as gc.get_count() and gc.get_stats(). These functions can help developers understand the behavior of the garbage collector and identify potential memory management issues.
import gc

print(gc.get_count()) # Get the number of objects in each generation
print(gc.get_stats()) # Get detailed statistics about garbage collection
  • Manually Triggering GC: Developers can manually trigger garbage collection using the gc.collect() function. This can be useful for forcing a cleanup at specific points in the application.
import gc

gc.collect() # Force a garbage collection cycle

Example of Tuning Garbage Collection

Here is an example of how to use the gc module to control and inspect garbage collection in a Python application:

import gc

# Disable automatic garbage collection
gc.disable()

# Perform operations that generate a lot of temporary objects
temp_list = [i for i in range(10000)]
temp_list = None # Remove the reference to the list

# Manually trigger garbage collection
gc.collect()

# Re-enable automatic garbage collection
gc.enable()

# Inspect the state of the garbage collector
print("Garbage collection thresholds:", gc.get_threshold())
print("Number of objects in each generation:", gc.get_count())

In this example, garbage collection is temporarily disabled to prevent interference with performance-critical operations. After the operations are complete, garbage collection is manually triggered to clean up unused objects, and then automatic garbage collection is re-enabled.

Best Practices for Garbage Collection

To effectively manage memory and avoid performance issues related to garbage collection, developers should follow these best practices:

  1. Minimize Cyclic References: Avoid creating unnecessary cyclic references in your code. When possible, design your data structures to be acyclic.
  2. Use Weak References: Utilize weak references (weakref module) for objects that should not extend the lifetime of other objects. This is particularly useful for caches and other temporary data structures.
  3. Profile and Tune GC: Use profiling tools to understand the memory usage patterns of your application. Adjust the garbage collection thresholds based on the profiling results to optimize performance.
  4. Explicitly Manage Resources: For resources that require deterministic cleanup (e.g., file handles, network connections), use context managers or explicitly release resources to avoid relying solely on garbage collection.

Memory Allocation Strategies

Python’s Memory Allocators

Python employs several memory allocators to manage memory requests efficiently. These allocators are designed to handle different types of memory requests, optimizing performance and minimizing fragmentation. The primary memory allocators used by Python are:

  1. Raw Memory Allocator: This allocator interacts directly with the underlying operating system to allocate and deallocate memory blocks. It is responsible for managing large memory requests and freeing memory back to the system.
  2. Object-Specific Allocators: These allocators manage memory for specific types of objects, such as integers, strings, and other built-in types. By tailoring the allocation strategy to the needs of specific objects, Python can reduce fragmentation and improve performance.
  3. Pymalloc: Pymalloc is a specialized allocator designed for small objects (less than 512 bytes). It enhances performance by reducing the overhead associated with frequent memory allocation and deallocation, which is common in Python programs.

Memory Pools

To further optimize memory management, Python groups small objects into memory pools. Each pool is a fixed-size block of memory used exclusively for objects of a specific size. This pooling mechanism minimizes fragmentation and improves allocation speed.

Here’s an example illustrating how memory pools work:

import sys

# Creating multiple small objects
small_int1 = 10
small_int2 = 20
small_int3 = 30

print(sys.getsizeof(small_int1)) # Size of the integer object
print(sys.getsizeof(small_int2)) # Size of the integer object
print(sys.getsizeof(small_int3)) # Size of the integer object

In this example, Python uses pymalloc to allocate memory for small integer objects. These objects are placed in memory pools to reduce fragmentation and improve allocation efficiency.

Arenas, Pools, and Blocks

Python’s memory management system organizes memory into arenas, pools, and blocks to manage different sizes of memory requests:

  1. Arenas: Large contiguous memory regions (256 KB each) allocated by the raw memory allocator. Each arena is subdivided into smaller pools.
  2. Pools: Fixed-size memory blocks (4 KB each) within an arena, used to manage small objects. Each pool is dedicated to objects of a specific size class.
  3. Blocks: The smallest units of memory within a pool, where individual objects are allocated. Each block is of a fixed size, corresponding to the size class of its pool.

This hierarchical organization allows Python to manage memory efficiently, balancing between large and small memory requests.

Example of Memory Allocation

Here’s an example illustrating how Python allocates memory for objects and manages them using its hierarchical memory management system:

# Creating a large list
large_list = [i for i in range(10000)] # Allocates memory for 10,000 integers

# Creating a small string
small_string = "Hello, World!" # Uses pymalloc for small objects

# Deleting objects
del large_list # Frees memory for the list
del small_string # Frees memory for the string

In this example, Python allocates memory for a large list using the raw memory allocator, while the small string is managed by pymalloc.

Memory Management in Practice

Effective memory management involves understanding how Python allocates and deallocates memory and using that knowledge to write efficient code. Here are some practical tips for managing memory in Python:

  • Avoid Unnecessary Object Creation: Reuse objects whenever possible to reduce the overhead of memory allocation. For example, instead of creating a new list inside a loop, append to an existing list.
# Inefficient
for i in range(100):
new_list = [i]
# Process new_list

# Efficient
reusable_list = []
for i in range(100):
reusable_list.append(i)
# Process reusable_list
reusable_list.clear()
  • Use Generators: Generators produce items on-the-fly instead of storing them in memory. This is especially useful for processing large datasets or streams of data.
def my_generator():
for i in range(10000):
yield i

for item in my_generator():
# Process item
  • Profile Memory Usage: Use profiling tools like memory_profiler to identify memory-hungry parts of your code. This helps in optimizing memory usage and identifying memory leaks.
from memory_profiler import profile

@profile
def my_function():
large_list = [i for i in range(10000)]
return large_list

my_function()
  • Explicitly Manage Resources: Use context managers to manage resources like file handles and network connections, making sure they are properly closed and released.
import weakref

class MyClass:
pass

obj = MyClass()
weak_ref = weakref.ref(obj)
  • Optimize Data Structures: Choose appropriate data structures based on the use case. For example, use sets for membership testing and dictionaries for fast lookups.
my_set = {1, 2, 3, 4}
if 3 in my_set:
print("Found 3")

Conclusion

Understanding Python’s memory management system is key for writing efficient and optimized code. By comprehending how reference counting works, how the garbage collector handles cyclic references, and the memory allocation strategies employed by Python, we can avoid common issues and improve the performance of our applications. Utilizing techniques like weak references, manual garbage collection tuning, and memory profiling can further improve memory management practices.

Thank you for reading! If you find this article helpful, please consider highlighting, clapping, responding or connecting with me on Twitter/X as it’s very appreciated and helps keeps content like this free!

--

--

Alexander Obregon

Software Engineer, fervent coder & writer. Devoted to learning & assisting others. Connect on LinkedIn: https://www.linkedin.com/in/alexander-obregon-97849b229/