Python variables and assignments — Variables don’t store values!

A detailed explanation on how Python assignments actually work

Tue Nguyen
8 min readMar 29, 2022

Why use variables?

Variables are building blocks of every program in any programming language. Imagine you have to perform an analysis that involves the current interest rate. Suppose today’s interest rate is 0.05 (or 5%). What will you do?

One approach is to use this hard-coded value of 0.05 in every formula that needs it. Suppose there are around 100 such places. What happens if you have to replicate your analysis next week and the interest increases to 0.07? You have to manually look for every place that uses the value 0.05 and replace it with 0.07. This process is painful, time-consuming, and prone to errors.

A better approach is to define a variable named interest_rate and assign the value 0.05 to it. For the rest of your analysis, you will not use the value 0.05 but use interest_rate instead. If the interest rate changes to a new value, you only have to adjust your code in one single place.

One common mistake is to think of the value 0.05 being stored inside the variable. That’s not true with Python, and it is crucial to understand how Python really works. In the next section, I will explain why.

How Python assignments work?

In Python, you create a new variable through an assignment as shown in the example below

x = 1000
print(x)
1000

The first line is an assignment that assigns the value 1000 to a variable named x. The second line just prints out the value associated with x. When you run the above cell, you will see 1000 is printed out. And that's why many people think that x "stores" 1000.

That’s not true. Here’s what happens under the hood (see the picture and read the explanation carefully)

Step 1 — Python evaluates the right-hand side (RHS) of the assignment and recognizes the value 1000.

Step 2 — Python creates an object to store this value (the green hexagon). This object has an ID (says A123) and occupies some slot in your computer's memory (says at the address 002).

Step 3 — Python, like an accountant, opens a ledger to find a variable named x. For now, suppose that x has never been created before. Since Python cannot find x, it creates a new entry (a row in the ledger) in the ledger to record the relationship between variable x and the corresponding object.

Thus, it is the object that contains the value, not the variable.

Step 4 — in the second line, when running the print statement, Python encounters x and understands that it is a variable, not a string because it is not wrapped in quotes. Thus, Python opens the ledger again to find x and now it can find the entry as shown in the picture. It knows that x is now associated with an object with ID A123, so it goes to the memory slot where the object resides, retrieves the value stored in the object, and prints it out.

You might think

“OK, but after all, it’s just 1000. Why makes things so complicated?"

That’s a valid point. But if you understand what I’ve explained above, it will help you avoid future mistakes when things become more complicated. Now consider another example

y = 2000
print(y)
2000

The process is very similar to what happened in the first assignment. The first line creates a new object with ID B456 is created at the memory slot with address 006 to store the value 2000 (see the picture below)

Now what happens if we run the following code?

y = 3000

Some might think that object B456 just kicks the value 2000 out and takes the value 3000 in, but it is not the case.

Remember that, in an assignment, Python always evaluates the RHS of = first to come up with a value, then it creates a totally new object to hold that value. Thus, a new object with ID C789 is created to hold 3000, and it occupies slot 004. Python also update the corresponding entry in the ledger to acknowledge that y is now associated with C789, not B456 anymore (see the picture below)

So what happens with object B456? It is destroyed by the garbage collector. Python has an auto-garbage-collection mechanism. Every time it notices that an object is not associated with a variable, it immediately destroys that object so that the memory slot is free again for future assignments.

Now consider another example

y = x + 2
print(y)
1002

As expected, we get 1002 when calling print(y). And you can easily describe the process

  • Python evaluates x + 2 and comes with the value 1002
  • Then Python creates a new object to store that value
  • Then Python updates the entry in the ledger to account for the new association of y with the new object

But in the following example, things will be a bit different

y = x
print(y)
1000

If you think that Python will evaluate x to come up with the value 1000 and then create a new object to hold it, then you are wrong.

When the RHS of the assignment contains only a variable name, Python will not create a new object. Instead, variable y is now pointing to the same underlying object that x is pointing to. Thus, this action is called aliasing. Python will open the ledger and update the second entry to acknowledge the fact that y is now associated with A123 as shown in the picture below.

Take away

  1. Think of an object as a box, a value as what’s inside the box, and a variable as the label pasted on the box.
  2. The box contains the value, not the label. To get the value, we use the label (because it’s easier for us as humans). But under the hood, Python finds the corresponding box in the ledger, opens it, gets the value, and returns the value to us.
  3. At one moment in time, there can be two labels pasted on the same box (as x and y in the last example). So whether we call print(x) or print(y), we get the same value back (because both variables are pointing to the same object)
  4. However, at one moment in time, it is impossible for a label to be pasted on two different objects. Because if so, when we call print(x), Python cannot decide which box to open.
  5. Suppose x is currently pointing to box A123 and we run x = 5000, then a new box is created to hold the value 5000 and this action is just like taking the label x from box A123 and pasting it on the newly created box.
  6. When Python notices a box without any label, it will immediately summon the garbage collector to destroy that box and return the memory slot to the system.

Verify the theory

Now it’s time to verify that what I have told you are indeed true

x = 1000
y = 1000

Look at the values, they are the same

# Print values
print(x)
print(y)
1000
1000

You might ask “how do I know they are indeed pointing to two different boxes?”

Python has an id() function that will return the ID of the object currently associated with the variable.

print(id(x))
print(id(y))
2584795105232
2584795105168

The two outputs differ, meaning that the two boxes that x and y are pointing to are different although they hold the same values.

This is like two 10-dollar notes that have the same value but are completely two different notes with different serial numbers.

We can check whether two variables are pointing to the same underlying memory slot using the keyword is.

# Compare values using ==
x == y
True# Compare identity using is
x is y
False

As you can see, x == y returns True (same value) but x is y return False (different objects)

Now assign a new value to y.

y = 2000
print(y)
print(id(y))
2000
2584795103408

As you can see, both the value and the box change because a totally new object is created to hold 2000. You can also notice that x and y are still pointing to different boxes. Just double-check.

x == yFalsex is yFalse

Now assign x to y

y = x

According to theory, x and y should be pointing to the same object now. Let's check it.

print(x)
print(y)
print(x == y)
1000
1000

Looks good. They have the same value. Now check their identities.

print(id(x))
print(id(y))
2584795105232
2584795105232

Indeed, they are pointing to the same box.

Python interning

We already verified that theory for some large numbers such as 1000 and 2000. Now consider the following example.

x = 10
y = 10
print(x == y)
print(x is y)
True
True

According to theory, x is y should give False because x and y are pointing to two distinct boxes although they shared the same value.

It turns out that this is not the case. x and y are indeed pointing to the same box. It happens to all small integers from -5 to 128 and to short strings.

This weird behavior (called Python interning) is due to some optimization decision of the Python core team which I will not discuss here. Basically, maintaining only one single copy helps save memory and make comparisons a lot faster. Now let’s double-check.

# Test for -5
x = -5
y = -5
print(x is y)True# Test for 128
x = 128
y = 128
print(x is y)True# Test for -6
x = -6
y = -6
print(x is y)False# Test for 129
x = 129
y = 129
print(x is y)False

For short strings

x = "Hello"
y = "Hello"
print(x == y)
print(x is y)
True
True

But for longer strings, the theory still holds

x = "Hello. How are you?"
y = "Hello. How are you?"
print(x == y)
print(x is y)
True
False

Key takeaway

  1. The theory is true in general
  2. Never compare identities of small integers
  3. In practice, you rarely compare identities, so it won’t matter much

Navigation

Previous article: Python’s basic syntax

Next article: Python naming rules and conventions

--

--

Tue Nguyen

Former data scientist. MSc student in quantitative economics. Love sharing data science stuff.