Delete delete delete! How Python collects your garbage for you

Daniel Tooke
4 min readAug 22, 2019

--

Happily there are no recycle bins in Python

In higher-level languages like Python we have the luxury of not worrying about managing memory ourselves- Python takes care of it for us. But what actually happens when we’re done with objects we’re no longer using? Will Python quietly get rid of it while we’re not looking? And how does it know what to delete?

I’ve been musing in my last few posts on the ways in which Python places objects at certain memory addresses, and if you assign a variable to that object you’re really just creating a pointer to a particular memory address where it finds that object. Effectively, variables are the signposts pointing to our objects in memory so we can find them. We can check the memory address of a particular object with id():

>>> x = 'a string I just created'
>>> id(x)
4381608752

If we now pass x into the deletion method del(), we’re really just deleting the variable x, not the object it points to. It’s deleting one of the signposts we were using to find our string object, but we can still retrieve the object itself if we have another variable that points to where it is in memory:

>>> x = 'a string I just created'
>>> y = x
>>> id(x) == id(y)
True
>>> del(x)
>>> y
'a string I just created'

It’s possible to see how many things are referencing a particular object; one way is to use sys.getrefcount(), but this count is always off by one as the method itself temporarily creates an extra reference to the object. Another more accurate method is pinch a method from deep in Python’s internal workings with the ctypes module (this is, of course, assuming you’re using the standard CPython- this will all work differently for, say, JPython or IronPython). If we pass a memory address (an integer we save as the variable i) into the method below, it will tell us how many things point to that memory address:

>>> import ctypes
>>> x = 'a string I just created'
>>> y = x
>>> i = id(x)
>>> ctypes.c_long.from_address(i).value
2
>>> del(x)
>>> ctypes.c_long.from_address(i).value
1

As we’d expect, when we delete x, the number of things pointing at 'a string I just made' goes down from 2 to 1, as now only y is the only reference to it in the whole program. So what happens if we delete y? Then we’d have no signpost to it at all- our object would surely be unreachable, somewhere in memory where we can’t find it…!

Well, happily Python knows what to do here. In every program you run, there’s only a finite amount of memory available to store objects and data. Any programming language will have tools to manage this memory, and keep as much of it free as possible ready for use. An unreachable object that sits there taking up space but not available for use is a waste- a memory leak!

As a result, Python does the responsible thing. It keeps a count of the references made to each object in memory, and if that count makes it to zero, it removes the object from memory entirely, freeing up that space for other objects. Makes sense, doesn’t it? Bravo, Python.

This isn’t a foolproof garbage collection method, however. There’s still one more trap that your objects could fall into, for which Python runs a special garbage collector service. It’s possible, if you have a couple of objects set up in a particular way, that you could end up with unreachable objects which reference each other in a circular way:

>>> l1 = []
>>> l2 = []
>>> i = id(l1)
>>> ctypes.c_long.from_address(i).value
1
>>> l1.append(l2)
>>> l2.append(l1)
>>> ctypes.c_long.from_address(i).value
2

These two arrays now reference each other in a (very!) circular way, and as we can see, these references add to the total kept by Python. If we now run del(l1) and del(l2), our two arrays would be unreachable as none of our variables point to them, but since they still refer to each other the reference count would be above zero, so Python wouldn’t know to automatically delete them. They’d be stuck in memory, taking up space, with no way for us to reach them.

Thankfully, Python has an inbuilt garbage collector module (called gc)to take care of this. This runs every minute or so by default, and will hunt down any circular references and delete them if necessary. It’s set up to run automatically, but you can also interact with it through the gc module if you’d like to invoke it manually, or even turn it off altogether. For most purposes though, it’s perfectly fine left just as it is. Programmers in lower-level languages often need to think much more deliberately about garbage collection methods. Python being a higher level language, this is one of the things that we generally allow to be taken off our hands so we can devote our energy to other things. Lucky us!

If you’re interested in a deeper dive into Python’s garbage collector, Artem Golubin has a good overview for you.

This is the fourth of a series of articles I wrote on memory addresses in Python; here are the others:

I. Variables and memory addresses in Python

II. Singletons and interning in Python

III. Will Python intern my string?

IV. Delete delete delete! How Python collects your garbage for you (this article)

--

--