A Beginner’s Guide to Python Data Structure

AvocadoandToast
5 min readFeb 8, 2018

--

What is data structure in Computer Science? A data structure is a particular way of organizing data to allow certain operations on it to be performed efficiently. I want to emphasize the difference between data type and data structure before introducing data structure categories, especially for those who learned Java first.

A Data Type is a set of values (the data) associated with a set of operations defined on the data.

A Data Structure is a particular way for organizing and storing data efficiently. A data structure can hold different data types (e.g. int, float, string).

A very intuitive example would be, if people are waiting in a checkout line for their purchases, this is a queue of people — queue is the data structure and people is a data type. If pigs are waiting in a line for food, this is a queue of pigs — queue is the data structure and pig is a data type.

Below the graph shows the classification of different data structures.

From “Fundamentals of Python: Data Structures” by Kenneth A. Lambert

There are four built-in data structures in Python — tuple, list, dictionary and set. We will see when to use and how to use each of them.

Tuple: A tuple is an immutable, ordered sequence of Python objects. Those objects could be different data types.

tup1 = ('java', 'python', 2007, 2008)

Fundamental operations:

len(tup1)
max(tup1)

List: A list is an mutable, ordered sequence of Python objects. Those objects could be different data types.

list1 = ['java', 'python', 2017, 2018]

Fundamental operations:

list1.insert(i, item)
list1.pop(i)
list1.append(item)
list1.remove(item)

Dictionary: A dictionary is an unordered collection of elements called entries. Each entry consists of a key and an associated value.

A dictionary’s keys must be unique, but its values may be duplicated.

dict1 = {'apple': 'fruit', 
'cabbage': 'vegetable',
'cake': 'dessert'}

Fundamental operations:

dict1.pop(key)
dict1.keys()
dict1.values()
dict1.items()

Set: A set is an mutable, unordered collection of unique items(i.e. No duplicate items).

Sets cannot contain mutable elements such as list, dictionary.

Fundamental operations:

set1.add(item)
set1.remove(item)

Here is a good summary of time complexity comparison for operations of List, Dictionary, Set.

After familiarize yourself with basic data structures and associated operations, you may ask: “ What data structure shall I choose when solving a problem? ”

Here are two questions we could follow:

  1. To solve this problem, what operations do I use a lot?
  2. What data structure makes these operations fast?

Let’s solve a problem together to have better sense of how to choose a data structure to make an operation fast. Here is a classic problem — Two Sum. I changed the example from a sorted list to unsorted list to better demonstrate difference of efficiency between List and Dictionary on the same operation.

“Given an array of integers, return indices of the two numbers such that they add up to a specific target. You may assume that each input would have exactly one solution, and you may not use the same element twice.”

Example:

Given nums = [2, 15, 11, 7], target = 9,

Because nums[0] + nums[3] = 2 + 7 = 9,
return [0, 3].

The problem is to find two numbers that the sum of them would be equal to the target. You may notice the given input is an array — Python doesn’t have a native array data structure, but it has the list which is much more general and can be used as a multidimensional array quite easily.

So for this problem, it’s very intuitive for us to pick list as the data structure to use. If we are going to use list, here is the solution:

Approach 1: brute-force search

Loop through each element v in the list and check if there is another value that equals to target-v. We will use an outer loop and an inner loop to implement this approach.

def twoSum(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: List[int]
"""
indices = [ ]
# check all the elements in the array
for i in range(0, len(nums)):
# check other elements in the array
for j in range(i+1, len(nums)):
if (nums[i] == target - nums[j]):
indices.append(i)
indices.append(j)
return indices
  • Time complexity : O(n²). For each element, we try to find its complement by looping through the rest of list which takes O(n) * O(n) time. Therefore, the time complexity is O(n²).
  • Space complexity : O(1). We do not use extra space except the given array.

The O(n²) time complexity is pretty inefficient. What is a better approach? Let’s follow these two questions:

To solve this problem, what operations do I use a lot?

Lookups. We need repeated lookups to find the value and index of each element and that of its complement in the given array.

What data structure makes lookups fast?

Dictionary. We need to return indices of a number and its complement that satisfy the problem’s statement. Each element has a “element : index mapping relationship. What is the best way to maintain a mapping of each element in the array to its index? A dictionary.

Approach 2: Use a dictionary.

  1. Create a dictionary.
  2. Loop through the elements of nums, getting i as index, and v as value. Each time get one pair of value:index.
  3. Check if target-v is a key in the dictionary we created. If this is ever true, return these two numbers’ indices as [dict[target-v], i] right away. Otherwise, put this pair of value:index in the dictionary and start the next iteration.
def twoSum(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: List[int]
"""
dict = {}
for i in xrange(len(nums)):
v = nums[i]
if target-v in dict:
return [dict[target-v], i]
dict[v] = i
  • Time complexity : O(n). We traverse the list containing n elements only once. Each look up in the table costs only O(1) time.
  • Space complexity : O(n). We created a dictionary which takes extra space required depends on the number of items stored in the dictionary — It stores at most n elements.

From this problem, we could see the time complexity and space complexity would change a lot if we choose a different data structure. If solving a problem needs repeated lookups, then we’d choose dictionary. If we need sort/care about order, we will choose list.

Other important data structures such as Tree, Queue, Stack, I will write a serial of summaries to demonstrate use cases of those data structures. For now, here is a very good reading material for Tree.

References:

  1. Leetcode Two Sum Solution
  2. “Fundamentals of Python: Data Structures” by Kenneth A. Lambert

--

--

AvocadoandToast

Senior Data Scientist in tech industry. Interview Tips || Time Management|| Productivity || Personal Development