Using Global Variables for Cleaner Python Code

Greg J
4 min readJun 30, 2022

--

Photo by Nejc Soklič on Unsplash

In a previous article we discussed how to use decorators to keep track of changes made to a dataframe over the course of a cleaning and filtering process. Here we’ll briefly reset that use case, then extend the functionality of our decorator by adding global variables. Our goal here is the cleanest and most reusable code we can manage, because remember: We’re lazy data scientists.

Reset the Use Case

Let’s say you’re a data scientist working for a retail client. They have provided historical transaction records, and they’ve asked you to build a recommendation engine to boost sales.

Let’s first take a look at some example data:

Creating some simple example data

As you can see, there is a null values in our sale_amt field. For the sake of this example, let’s assume that for now we simply want to drop rows with null values. However, we don’t just want to drop those rows, we want to keep track of the transaction ID’s we’re dropping so that we can present them to our client for discussion in our next weekly meeting. Perhaps we can help them diagnose a data collection issue, or maybe we can collaboratively decide on a strategy to fill the null values.

In our previous discussion on decorators, we built a lightweight and generic function to track the set of transaction ID’s we’re dropping during our filtering process:

Here we’ve defined a small function, remove_nulls(), that will drop nulls from our data, while also returning a list of transaction ID’s that drop out of the dataset during that filtering step. This particular use case is overly simple, but you can extend this framework to more complex filtering like threshold-based filtering.

So what’s the problem?

The inconvenience in the code above, and what we’d like to fix, is that at execution our remove_nulls() function returns two objects: our filtered dataframe, and a list of transaction ID’s.

Admittedly that isn’t a major inconvenience. But imagine you change what the function or the decorator return in the future, maybe you add a statement to return the list of revenue you’re dropping as well as the transaction ID’s. Now you’ll have to change the execution code anywhere you’ve used this function to adapt to the additional returned object. That could be a real hassle, and one we can avoid.

What if we could add items to our filtered_trans list without having to return a new list?

We know that variables defined in a Python function are only available locally. In other words, what happens in a function, stays in a function. Only what’s explicitly called out by a return or yield statement can be accessed outside a function. Is there any way around that requirement?

Maintaining results without return

The answer to our problem is using global variables!

Global variables are exactly what they sound like: the antithesis of local variables, available to be accessed anywhere in the execution space. If we can convert our filtered_trans list from a local to a global variable, we can change its contents from within our function without having to explicitly return it.

So how do we do that? It’s actually surprisingly simple! All we have to do is use the global keyword in conjunction with our variable name during the function definition:

Here we’ve only changed a few lines from our previous declaration:

  1. We added line 8 where we declare filtered_trans as a global variable.
  2. We removed filtered_trans from our return statement in line 10.

Now when we execute this function we simply to need to run:

Our filtered_trans object will still be accessible just like any other object. We can save it out somewhere to eventually report back to the client.

We’ve also saved ourselves a little bit of code, and tidied up our function calls to be much neater.

Additional Ideas

I do want to call out a couple items before we conclude:

  1. Recall that when using a decorator, we have to use the @decorator_name designation at the definition of the function that implements that decorator. If we change a decorator, we’ll also need to redefine the function that implements the decorator functionality. In our case, that simply means rerunning the code that defines our remove_nulls() function.
  2. The code we’ve written above will create a new variable called filtered_trans ever time we execute the function. If you build a series of functions that do more than just removing nulls, you’d overwrite the filtered_trans variable each time. To be more robust, it would probably make sense to adapt the logic to use filtered_trans.append(). We haven’t done so here for brevity, but keep this point in mind in your implementations.

Conclusion

I’ve found this to be a powerful tool in my quest for ever-cleaner and more reusable code. I’ve added this functionality to several of the functions and decorators in my generic helper code that I’ve stashed in my private repo and turned into a Python package.

I’m always trying to learn, and I’m no expert! If you have feedback or suggestions on how you’re using global variables or any other topics, I’d love to hear it!

--

--

Greg J

My dog, data science, and my never-ending quest to find the best pizza