Python Objects Part II: Demystifying CPython Shared Objects

Brennan D Baraban
7 min readJan 13, 2019

--

In my first article on Python objects, I mysteriously referenced an implementation of the CPython virtual machine I referred to as shared objects. Behind the scenes, shared objects are instrumental to how certain Python objects are allocated and referenced in memory; specifically, integers, unicode characters, and empty tuples. Despite this, general resources documenting the implementation are sparse, keeping the behavior shrouded in mystery. In this article, the second part of my Python Objects series, I’d like to demystify this knowledge.

FIRST THINGS FIRST — A REFRESHER

Before we dive in, a few preliminaries. To refresh:

  • Everything in Python is an object, something that a variable can refer to.
  • Objects are classified by their value, type, and identity (aka. memory address).
  • The value of an immutable (unchangeable) object is tied to its identity — if the value changes, the object changes.
  • The value of a mutable (changeable) object is not tied to its identity — identity is retained across changes made to the object.

Also, note that the behavior discussed in this article is specific to CPython versions 3.0 and above. You are not guaranteed the same behavior on different implementations or versions of Python.

SHARED OBJECTS — INTRODUCTION, WITH INTEGERS

Now, to revisit the offending example. Recall the following:

>>> x = 89
>>> y = 98
>>> x is y
False

Here, in the first line, the variable x is assigned to refer to an instantiation of the int object 89, while y is assigned to refer to a separate instantiation of 98. Thus, the two identities differ, and x is not y. In contrast:

>>> x = 98
>>> y = 98
>>> x is y
True

The above is true because x and y refer to the same object.

Now, in part one of this series, I described this behavior as relating to mutability. Since immutable objects cannot change, when instructed to store an int, Python attempts to save space by first checking if one such object of the same value already exists.

However, this idea collapses with the following example:

>>> x = 400
>>> y = 400
>>> x is y
False

I apologize. I admit, my earlier explanation was not entirely true. Yet, it was not entirely false, either. What is happening here? The answer is shared objects.

I do not intend to contradict myself, so allow me to clarify — for immutable types, Python does search for identical pre-existing objects before instantiating new objects. However, it does not just search for any pre-existing object; it specifically searches for what I’ll [unofficially] refer to as shared objects.

(To clarify — the CPython virtual machine is the original and most commonly-used implementation of Python.)

Shared objects are ranges and specific instances of certain immutable types that CPython instantiates and loads in memory every time a Python interactive session is initialized. These objects are static global variables that can be accessed by all programs executed in a given Python session.

The idea behind this implementation is that certain data values are used most often in programming. By always and immediately caching these commonly-used objects, the CPython virtual machine saves significant time and memory. Instead of unnecessarily creating a new object every time one of these values is called, variables can simply be referred to a pre-existing shared object.

For integers, the range -5 to 256, included, are loaded as shared objects. Within CPython, this occurs through an assignment of these values to the macros NSMALLPOSINTS and NSMALLNEGINTS.

GitHub: python/cpython/Objects/longobject.c
GitHub: python/cpython/Objects/longobject.c

Thus, in our equivalency example:

>>> x = 98
>>> y = 98
>>> x is y
True

No new objects are created. In the first line, Python checks its shared values, determines that 98 already exists, and refers x to that loaded value. The same goes for y.

It is only when we go beyond the range of shared integer values that Python begins to allocate new objects. So, revisiting our earlier, offending example:

>>> x = 400
>>> y = 400
>>> x is y
False

The behavior makes more sense — since 400 is not within the range of shared integer values, Python allocates separate objects for each instance of 400 just as it would creating two different mutable objects.

With this understanding of shared objects, we can revise our understanding of the process of Python object instantiation:

  1. When Python is instructed to create a reference to an object, it first checks its mutability. If the object is mutable, it will immediately allocate a new object for the value — mutable objects must have unique identities.
  2. If the object is immutable, Python will check if the given object matches any shared objects already cached in memory. In the case that a shared object is matched, the variable is directly referred to that pre-allocated object.
  3. Otherwise, a new object is created.

OTHER SHARED OBJECTS — UNICODE CHARS AND EMPTY TUPLES

Thus far I’ve focused exclusively on integer types. Shared objects are also relevant to two other Python types, however — unicode characters and tuples.

Within the CPython source code, string types are truly defined as a series of pointers to PyUnicodeObject objects used to represent characters (hence why strings are described as text sequences). When a Python session is initialized, CPython loads and stores the range of the Latin-1 character set (unicodes 0 to 255, included), as shared values.

GitHub: cpython/Objects/unicodeobject.c

This is why, in the following example, two variables referring to the copyright character, unicode decimal value 169, are equivalent objects (the chr() method used here returns the string representation of a unicode value):

>>> a = chr(169)
>>> a
'©'
>>> b = chr(169)
>>> b
'©'
>>> a is b
True

But two new characters pointing to the value of the Greek omega symbol, unicode decimal value 937, are not:

>>> a = chr(937)
>>> a
'Ω'
>>> b = chr(937)
>>> b
'Ω'
>>> a is b
False

Last but not least, CPython caches one last value — empty tuples.

GitHub: cpython/Objects/tupleobject.c

Recall that although tuples are immutable types, it is crucial that they are instantiated with unique identities, since they can contain potentially changing values:

>>> t1 = (1, 2)
>>> t2 = (1, 2)
>>> t1 is t2
False

Variables referring to empty tuples, however, will always refer to the same shared empty tuple argument:

>>> t1 = ()
>>> t2 = ()
>>> t1 is t2
True

The logic here being that empty tuples, as an immutable type with no potentially changing values, will always be identical, and accordingly only need a single instantiation in memory.

And that’s a wrap! Hopefully, with a new understanding of shared values, you can begin making sense of those confusing, outlier object comparison cases. In the next part of this series, I’ll close the circle on Python object instantiation with one last examination, of string interning.

TL;DR:

  • CPython pre-allocates shared values, certain ranges of commonly-used immutable types.
  • When Python is instructed to instantiate a new immutable object, it first checks to see if an identical object already exists as a shared object.
  • The idea is that by caching these commonly-used immutable types, CPython can operate more efficiently in both time and memory.

CPython Shared Values Guide:

Integers:

Unicode Characters:

  • Relevant Python Object Type: str
  • Cached Object Range: Latin-1 Unicode range — decimals 0 (inclusive) to 256 (not inclusive)
  • CPython Source File: cpython/Objects/unicodeobject.c

Tuples:

Python Object Instantiation Chart Version Two:

More From the Python Objects Series:

--

--