The Best Way to Boost Your Membership Tests In Python!

Muhammad Daif
3 min readJul 4, 2024

--

Time and again, I’ve come across intelligent developers testing for membership in Python in ways that aren’t optimal. In this post, I will explore the performance pitfalls of using lists for membership tests.

Introduction

Membership testing is a fundamental operation in programming. Whether you’re checking if a user is part of a system, if a key exists in a configuration, or if a number is in a sequence, you’re performing membership testing. While this might seem straightforward, the method you choose can greatly affect the efficiency of your code. I’ve often observed that even seasoned developers sometimes overlook the optimal ways to perform these tests.

The Basics of Membership Testing

At its core, membership testing is about determining if a particular item exists within a collection. In Python, this is often done using the in keyword. The efficiency of this operation, however, can vary based on the data structure you're working with.

I’ve seen folks doing that repeatedly to perform membership testing.

# Basic membership test
elements = [1, 2, 3, 4, 5]
print(3 in elements) # True

The Problem with Lists for Membership Testing

Lists are a go-to data structure in Python, often used to store sequences of items. However, when it comes to membership testing, they might not be the best choice. The reason? Lists check each item sequentially until a match is found. This means, in the worst-case scenario, it might have to traverse the entire list. For a list of n items, this can take O(n) time.

In simple terms, if you’re looking for an item in a list and the item isn’t actually there, it’ll need to go through all the items in the list until the test yields False. That also means the time it takes to do this check goes up as the list size goes up !

Let’s take an example.

Here, we defined two lists that represent two variations of an input. Both lists are list of integers. The difference is user_input_2 is 10 times bigger than user_input_1.

If we need to go through both of them and check if the number 1000 exists (which it doesn't), it'll take us 10 times more to go through 10 times the input size. This is what we refer to by linear growth or O(N), where N is the number of inputs.

It gets even worse when it’s 100 times bigger as in user_input_3 case.

What happens if instead of using lists, we use sets ?

yes, you’re seeing right, the mean runtime went up then down again with the input, and in all cases it’s not proportional to the input size !

As you can see, when dealing with a small number of items, the difference between lists and sets might not be noticeable. In the matter of fact using lists with a small number of items might be even favorable. However, as the collection grows, the inefficiency of lists becomes evident. For instance, checking membership in a list of a million items can be a hundred times slower than in a set of the same size.

Ummm .. WAT?

Membership testing is a common operation, and the data structure you choose can significantly impact performance. While lists are versatile, sets offer a more efficient solution for membership tests. But why exactly are sets so fast, and how do they manage to achieve such impressive performance? That’s a topic for another blog post 😎. For now, let’s just agree to use sets instead of lists when doing membership tests.

--

--

Muhammad Daif

Geek by nature — Software Engineer by profession — Expat by residence. I write about things that interest me :-)