Python variables and assignments — Variables don’t store values!
A detailed explanation on how Python assignments actually work
Why use variables?
Variables are building blocks of every program in any programming language. Imagine you have to perform an analysis that involves the current interest rate. Suppose today’s interest rate is 0.05
(or 5%
). What will you do?
One approach is to use this hard-coded value of 0.05 in every formula that needs it. Suppose there are around 100 such places. What happens if you have to replicate your analysis next week and the interest increases to 0.07
? You have to manually look for every place that uses the value 0.05 and replace it with 0.07. This process is painful, time-consuming, and prone to errors.
A better approach is to define a variable named interest_rate
and assign the value 0.05 to it. For the rest of your analysis, you will not use the value 0.05 but use interest_rate
instead. If the interest rate changes to a new value, you only have to adjust your code in one single place.
One common mistake is to think of the value 0.05 being stored inside the variable. That’s not true with Python, and it is crucial to understand how Python really works. In the next section, I will explain why.
How Python assignments work?
In Python, you create a new variable through an assignment as shown in the example below
x = 1000
print(x)1000
The first line is an assignment that assigns the value 1000
to a variable named x
. The second line just prints out the value associated with x
. When you run the above cell, you will see 1000
is printed out. And that's why many people think that x
"stores" 1000
.
That’s not true. Here’s what happens under the hood (see the picture and read the explanation carefully)
Step 1 — Python evaluates the right-hand side (RHS) of the assignment and recognizes the value 1000
.
Step 2 — Python creates an object to store this value (the green hexagon). This object has an ID (says A123
) and occupies some slot in your computer's memory (says at the address 002
).
Step 3 — Python, like an accountant, opens a ledger to find a variable named x
. For now, suppose that x
has never been created before. Since Python cannot find x
, it creates a new entry (a row in the ledger) in the ledger to record the relationship between variable x
and the corresponding object.
Thus, it is the object that contains the value, not the variable.
Step 4 — in the second line, when running the print
statement, Python encounters x
and understands that it is a variable, not a string because it is not wrapped in quotes. Thus, Python opens the ledger again to find x
and now it can find the entry as shown in the picture. It knows that x
is now associated with an object with ID A123
, so it goes to the memory slot where the object resides, retrieves the value stored in the object, and prints it out.
You might think
“OK, but after all, it’s just
1000
. Why makes things so complicated?"
That’s a valid point. But if you understand what I’ve explained above, it will help you avoid future mistakes when things become more complicated. Now consider another example
y = 2000
print(y)2000
The process is very similar to what happened in the first assignment. The first line creates a new object with ID B456
is created at the memory slot with address 006
to store the value 2000
(see the picture below)
Now what happens if we run the following code?
y = 3000
Some might think that object B456
just kicks the value 2000
out and takes the value 3000
in, but it is not the case.
Remember that, in an assignment, Python always evaluates the RHS of =
first to come up with a value, then it creates a totally new object to hold that value. Thus, a new object with ID C789
is created to hold 3000
, and it occupies slot 004
. Python also update the corresponding entry in the ledger to acknowledge that y
is now associated with C789
, not B456
anymore (see the picture below)
So what happens with object B456
? It is destroyed by the garbage collector. Python has an auto-garbage-collection mechanism. Every time it notices that an object is not associated with a variable, it immediately destroys that object so that the memory slot is free again for future assignments.
Now consider another example
y = x + 2
print(y)1002
As expected, we get 1002
when calling print(y)
. And you can easily describe the process
- Python evaluates
x + 2
and comes with the value1002
- Then Python creates a new object to store that value
- Then Python updates the entry in the ledger to account for the new association of
y
with the new object
But in the following example, things will be a bit different
y = x
print(y)1000
If you think that Python will evaluate x
to come up with the value 1000
and then create a new object to hold it, then you are wrong.
When the RHS of the assignment contains only a variable name, Python will not create a new object. Instead, variable y
is now pointing to the same underlying object that x
is pointing to. Thus, this action is called aliasing. Python will open the ledger and update the second entry to acknowledge the fact that y
is now associated with A123
as shown in the picture below.
Take away
- Think of an object as a box, a value as what’s inside the box, and a variable as the label pasted on the box.
- The box contains the value, not the label. To get the value, we use the label (because it’s easier for us as humans). But under the hood, Python finds the corresponding box in the ledger, opens it, gets the value, and returns the value to us.
- At one moment in time, there can be two labels pasted on the same box (as
x
andy
in the last example). So whether we callprint(x)
orprint(y)
, we get the same value back (because both variables are pointing to the same object) - However, at one moment in time, it is impossible for a label to be pasted on two different objects. Because if so, when we call
print(x)
, Python cannot decide which box to open. - Suppose
x
is currently pointing to boxA123
and we runx = 5000
, then a new box is created to hold the value5000
and this action is just like taking the labelx
from boxA123
and pasting it on the newly created box. - When Python notices a box without any label, it will immediately summon the garbage collector to destroy that box and return the memory slot to the system.
Verify the theory
Now it’s time to verify that what I have told you are indeed true
x = 1000
y = 1000
Look at the values, they are the same
# Print values
print(x)
print(y)1000
1000
You might ask “how do I know they are indeed pointing to two different boxes?”
Python has an id()
function that will return the ID of the object currently associated with the variable.
print(id(x))
print(id(y))2584795105232
2584795105168
The two outputs differ, meaning that the two boxes that x
and y
are pointing to are different although they hold the same values.
This is like two 10-dollar notes that have the same value but are completely two different notes with different serial numbers.
We can check whether two variables are pointing to the same underlying memory slot using the keyword is
.
# Compare values using ==
x == yTrue# Compare identity using is
x is yFalse
As you can see, x == y
returns True
(same value) but x is y
return False
(different objects)
Now assign a new value to y
.
y = 2000
print(y)
print(id(y))2000
2584795103408
As you can see, both the value and the box change because a totally new object is created to hold 2000
. You can also notice that x
and y
are still pointing to different boxes. Just double-check.
x == yFalsex is yFalse
Now assign x
to y
y = x
According to theory, x
and y
should be pointing to the same object now. Let's check it.
print(x)
print(y)
print(x == y)1000
1000
Looks good. They have the same value. Now check their identities.
print(id(x))
print(id(y))2584795105232
2584795105232
Indeed, they are pointing to the same box.
Python interning
We already verified that theory for some large numbers such as 1000
and 2000
. Now consider the following example.
x = 10
y = 10print(x == y)
print(x is y)True
True
According to theory, x is y
should give False
because x
and y
are pointing to two distinct boxes although they shared the same value.
It turns out that this is not the case. x
and y
are indeed pointing to the same box. It happens to all small integers from -5
to 128
and to short strings.
This weird behavior (called Python interning) is due to some optimization decision of the Python core team which I will not discuss here. Basically, maintaining only one single copy helps save memory and make comparisons a lot faster. Now let’s double-check.
# Test for -5
x = -5
y = -5print(x is y)True# Test for 128
x = 128
y = 128print(x is y)True# Test for -6
x = -6
y = -6print(x is y)False# Test for 129
x = 129
y = 129print(x is y)False
For short strings
x = "Hello"
y = "Hello"print(x == y)
print(x is y)True
True
But for longer strings, the theory still holds
x = "Hello. How are you?"
y = "Hello. How are you?"print(x == y)
print(x is y)True
False
Key takeaway
- The theory is true in general
- Never compare identities of small integers
- In practice, you rarely compare identities, so it won’t matter much
Navigation
Previous article: Python’s basic syntax
Next article: Python naming rules and conventions