A “reference” for Python Variables
If you were expecting a follow-up to my VirtualBox series, I do apologize. I have working code for that stuff, but it just is not up to my standards for release, and I’ve been too busy to clean it up. Instead for this article, I’m going to talk about some Python peculiarities that have certainly confused many a new Python developer (myself included).
Classic programming languages typically pass variables to functions via one of two methods: pass-by-value or pass-by-reference. Pass-by-value, occurs when the variable is functionally copied and the new value is provided to the function. This has the advantage that any modifications to the variable within the method, do not have any so-called “side-effects” outside of the function. Objects and more complicated types of data structures are usually pass-by-reference. Pass-by-reference means that a “reference” or pointer to the variable is passed to the function, and any actions that the function performs on that reference, affect the one copy of the object (which may be used by other segments of code), thus this mode of data passing can have external side-effects. Sometimes these external side-effects are an intended feature, and pass-by-reference is much more efficient.
Python does things differently (and yet still similar) to more classical languages, and these differences can cause confusion. Let me illustrate with a couple of examples:
Hopefully the above example is not too surprising to anyone. Passing a simple value like an integer (or others like bool and str) are effectively acting like pass-by-value.
Let’s change a few small things and see what happens:
One again, hopefully this is fairly straight-forward. We reference a “global” variable (simultaneously angering the programming gods), and suddenly we see side-effects from calling our test method. Still, hopefully this makes intuitive sense. We are explicitly stating that we wish to perform actions on a “global” (to us) variable, and we would naturally expect that those changes would persist outside of our current “local” scope.
Let’s see one more method before we start trying to draw any conclusions:
This result might be surprising to some people. Clearly not everything is pass by value. Unlike our previous examples with simple built-in data-types, our custom object was effectively passed “by-reference”. This might seem even stranger when you realize that “simple” data types like “str” (strings) are not just strings but have built in methods (things like “lower”, “strip”, etc.), so even these “simple” types are effectively objects. However, this is not the end of the story. One might reasonably conclude that only Python custom objects behave this way, or that built-ins are always pass-by-reference. Let’s look at a few more examples:
From the above two examples, you can see that both Python dict and list built-ins appear to be pass-by-reference. For whatever reason (which will be explained further down), modifying these sub-elements have side-effects. For my next trick, take a look at these two examples and compare them to the above results:
Wait what!? This might seem extremely confusing. Why did these examples not cause global side-effects, but the previous two did? Before we dive into exactly what is happening here, let’s start with a general rule of thumb which you can use to prevent confusion of behavior in Python:
In Python, if you attempt to replace or modify the passed in object, there will not be any global side-effects to this action. However, if the object is subscriptable (it has sub-items, like a dict, list, or object attribute), modifying those subscriptable sub-items will have global side-effects.
Let’s pull back the curtain so that we can understand why this “weird” behavior occurs. What happens when you pass a variable to a method is that Python actually creates a new local variable and hands it a reference to the passed in object (no matter the type). If you then attempt to overwrite the contents of this variable, what you are in-fact doing is overwriting the local reference and not the actual value. Once you wrap your mind around this, it all starts to make sense. In the case of subscriptable items, we are overwriting the reference that the subscriptable attribute pointed to, but since this reference is inside of an object, Python first resolves that reference, then overwrites the sub-item reference, and thus causes a global side-effect. I know this is confusing, so let me demonstrate with some more examples.
Python provides a built-in method call “id” which returns an integer representing the address in memory where a variable is referencing. Let’s use that to get a handle on an example similar to our very first case:
As you can see from the example, var1 starts out by holding a reference to the same location as a. However, by assigning it the value of ten, we aren’t really assigning it the value, but actually assigning var1 the reference that ten was pointing to. Thus this set of operations does not change the value of the reference that a was pointing to.
Now let’s look at one of those subscriptable examples:
As you can see from the above results, the following sequence of events occurred:
- A reference to a was passed to test and stored in var1.
- test attempts to assign the value of “11” to the dictionary key test in var1…except we now know there is more to the story. Here is what really happens:
2a. Python resolves the reference to var1 (and thus also a).
2b. Python follows the underlying date structure to the reference in var1/a to test.
2c. Python stores the value “11” in memory and gets the associated reference.
2d. Python copies this “11” reference over top of the old reference that var1[“test”] and a[“test”] pointed to, thus causing the observed global side-effects.
- Abracadabra! Suddenly the results make sense.
Hopefully this article clears up a lot of confusion around Python variable passing behavior. Before I wrap this up, I want to cover a couple more related traps for beginners.
It is relatively common to pass in a Python dict to a method, and then wish to perform actions upon the dictionary without causing any global side-effects. A naïve way of doing this might be something like the following:
As you can see, we called the “.copy()” method to clone our dictionary before modifying the clone and everything worked as expected. We were able to successfully modify our cloned dictionary without affecting the original passed in dictionary. Everything is good, right? Well, not exactly… Let’s look at another example to see the subtle problem:
If it is not immediately obvious what is happening above, let me elaborate. The “.copy()” method on objects in Python does what is called a “shallow-copy”. Which is to say, it only creates copies of the first layer in a subscriptable item. For anything “deeper”, it simply copies the existing references, which will result in side-effects to the original object.
In most cases, attempting to clone an object in a method is indicative of a bad programming pattern in Python. You are usually better off by creating a brand new object and copying values into this new object as you loop or recurse through the original object. That said, there is a solution if you do indeed need to fully clone an object that was passed into a method. This solution is to use the “copy.deepcopy” method. Below is the same example as above, but using this method to fully clone our dictionary, with the expected result:
Hopefully, this article was illuminating to you if you have found Python variable passing confusing in the past.