Data Structures In Python

Introduction

When I hear the word data structures what quickly comes to mind is algorithms, these two go hand in hand because one is dependant on the other and vice-vasa i.e an algorithm is as good as it’s data structure. Let me take a step back and demystify these two terms Algorithm and data structures.

A data structure is a data organization, management and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.

An algorithm is a detailed series of instructions for carrying out an operation or solving a problem. In this article, we are not going to cover algorithm in detail as we’ll majorly deal with data structures and how it can improve the efficiency of our common operations such as search, delete and inserts. We will approach it using an example python program which we will develop using the appropriate data structure.

The application that we will be developing is a command-line application that is used to manage the allocations of rooms at Andela.

Problem statement

When a new Fellow joins Andela they are assigned an office space and an optional living space if they choose to opt in. When a new Staff (anyone that is not a fellow) joins they are assigned an office space only. We are required to digitize and randomize a room allocation system for one of Andela Kenya’s facilities called The Dojo.

The Dojo has rooms, which can be offices or living spaces. An office can accommodate a maximum of 6 people. A living space can accommodate a maximum of 4 people.

A person to be allocated could be a fellow or staff. Staff cannot be allocated living spaces. Fellows have a choice to choose a living space or not.

This system will be used to automatically allocate spaces to people at random.

System design

I’ll choose to use OOP to design this system and in particular, adhere to the SOLID principles that Robert C. Martin recommended for object-oriented design. To learn more on these principles you can check out this article by Samuel Oloruntoba.

So first we have to identify which objects we have in our system. If we inspect our problem statement closely We’ll extract nouns like Staff, Fellow, Person, Room, Office Space, Living Space, and Dojo.

We can take the nouns and convert them into our classes and we will have:

  1. Class Person: This will have the attributes of a person i.e Name, id no, age, role.
  2. Class Staff: This will inherit from Class Person because a staff is a person the only difference is that he has a role of staff and that He is not allowed to have a Living space.
  3. Class Fellow: This will also inherit from Class Person since a Fellow is also a person. Fellow will have other attributes such as Cohort and wants_accomodation.
  4. Class Room: This will have attributes of a room such as id, name, purpose, capacity and occupants.
  5. Class OfficeSpace: This will inherit from class Room since an office space is a room the only difference is that an office space has a max capacity of 6 people.
  6. Class LivingSpace: This will inherit from class Room since a Living space is a room the only difference is that a Living space has a max capacity of 4 people, and It’s only allocated to fellows.
  7. Class Dojo: Dojo will be our interface class and it will be storing the Rooms created and the people registered to the system, It will be keeping track of people who are registered and have not been allocated rooms, and It will be allocating people rooms automatically as they are created depending on the priority assigned to them. It should be able to deduce relationships of people in the system ie, Fellow in the same cohort etc.

Now that we’ve planned out the system let’s get our boots wet.

Room Class Implementation

While creating this class we’ll adhere to principle no.1 of the SOLID principle which is: A class should have one and only one reason to change, meaning that a class should have only one job. Our Class Room job will be to be extended by Class OfficeSpace and Class LivingSpace, and should not be instantiated.

line 1: We import ABCMeta which will use to declare our class as an abstract class so that it prevents people from instantiating it.
line 4: We declare our class.
line 5: We make our class an abstract class.
line 9: We declare a class variable which will track the number of rooms created and will be used to assign room Id's.
line 10 -17: declare the room constructor.

The Room object should be able to add and remove occupants from itself. let’s add that functionality and wrap up the implementation of a room

line 19 - 23: define an add_occupant method that checks if the room has spaces before adding an occupant.
line 25 - 29: defines a remove_occupant method that checks if the occupant you want to remove is in the room then proceeds to remove the occupant if found in the room.
line 31 - 32: defines an is_occupant method that checks if a person is an occupant in the room an returns a bool true or false depending on weather or not the person is an occupant. 
line 34 - 35: defines an is_full method that checks weather or not the room is fool and returns a bool true or false. 
line 37 - 43: defines the __repr__ to define how our class object will be represented.
line 44 -45: defines the string representation of our class.

OfficeSpace and LivingSpace Class implementation

Sticking with the SOLID principle rule no.2 Objects or entities should be open for extension, but closed for modification. Meaning that our class should be easily extendable without modifying the class itself. Our Room class satisfies this, and our OfficeSpace and LivingSpace class can easily extend the Room class without the need to modify the Room class to accommodate either of their functionality.

Person, Fellow, Staff Class implementation

Implementing these classes will be pretty easy since we’ll follow the same convention as we did with the Room, LivingSpace, OfficeSpace classes.

Dojo Class implementation

We’ll start by implementing the create room functionality in this class. This functionality will be tasked to add rooms and make sure it doesn’t add rooms twice. This functionality will search rooms, insert rooms, and delete room. Rooms at this facility at most would range into 100s hence we can safely use lists to store the room objects. A list has a worst case time complexity for searches, insertion, and deletion of O(n). Our system may experience a worst-case time complexity of O(100) cycles which is not bad to operate from.

We’ll then implement the add person functionality in this class. This functionality will be tasked with adding people and make sure it doesn’t add people twice in the system. This functionality will search for people, insert people, and delete people. The number of people at this facility is growing rapidly and it might surpass 100,000. Due to these large numbers, we can’t use lists to store the Person objects because of it’s worst case time complexity of O(n). We don’t have to search far since there is a common data structure developed to solve this problem. This is a binary search tree (BST). BSTs have a worst case complexity of O(n) but what’s unique about them is that they have an average case complexity of O (log(n)) for search insertion and deletion. Since we will be mostly operating on the average case since the Id_no are random enough we can safely assume our time complexity will average at O(log(n)) which is pretty fast.

We’ll start by creating our BST data structure then implement our add person Functionality.

Then we use the BST we’ve created to store people registered in the system.

Now that we can create rooms and add people to our system, we are now left with the task of allocating those rooms to the people who need it. We know that offices spaces can be allocated for both fellows and staff hence the allocation is by a first come first serve basis. The best data structure for this operation is a Queue. The main operations in this queue will be dequeuing and enqueueing people on the waiting list of getting an office space. These operations have a worst case complexity of O(1) which is pretty fast. This can easily be achieved by using a List in python and pop the first element on the list to dequeue and append an element at the end of the list to enqueue.

Now, let's use the Queue we’ve created to store people on the office space waiting list and use the queue to allocate office spaces.

Another problem we are supposed to solve is the allocation of living spaces just like offices they are scarce. Living spaces are not to be allocated on a first come first serve basis but on a priority basis. Priority is given to fellows that come from outside the county since they are new to the country and don’t know where to get accommodation, second priority is given to fellows that come from the remote parts of the country and are new to the city then lastly any other fellow who need accommodation will be accommodated. This can be efficiently implemented with another type of Queue called a Priority Queue. To implement this we’ll call upon another common data structure known as a Heap. Heaps come in two different flavours: max-heap and min-heap. A max-heap is a complete binary tree in which the value in each internal node is greater than or equal to the values in the children of that node. A min-heap is a complete binary tree in which the value in each internal node is less than or equal to the values in the children of that node. What we will implement is a queue that serves people with the highest priority first and we’ll definitely use a max-heap to do that. The good thing with heaps is that they have a worst-case time complexity of O(1) for insertions and deletions which is super fast.

Now, let's use the priority Queue we’ve created to store people on the living space waiting list and use the queue to allocate living spaces.

Now that the functionality that we scoped out in the design is finished we can give our system a user-friendly panel from which It can be accessed and these functionalities executed. Since we set out to create a command line interface (CLI)system we’ll limit our selves just to that. There are are many python libraries for creating CLIs but for us, we’ll just choose Docopt. It’s not well documented but its simple to implement.

Okay now let’s take our system for a spin. This code can be found on this repository.

clone the repo

git clone https://github.com/kwahalf/data-structures.git

navigate to the root folder of the app and create a virtual environment

cd data-structures && virtualenv --python=python3 my-env

activate the virtual environment and install the requirements

source my-env/bin/activate && pip install -r requirements.txt

then start the app in interactive mode

python cli.py -i

You should be greeted by a screen similar to this

Let’s create an office using the command

create_room Office Orange

let’s add a person in the system by using the commands

add_person denis juma fellow 400466 --a=y --p=6

let’s create a living space and see that it auto allocates denis one since he was queued

Now that we can verify that our system works as expected you can use the usage pattens to be able to play around with the system to explore it further.

conclusion

In this article we have covered 4 data structures Lists, queues, priority queues (using max-heaps), and binary search trees. There are many data structures out there and scientists are working tirelessly to develop new and efficient ones everyday to speed up common operations, and we’ve barely scratched the surface. I’d encourage you to look them up and try to implement them, you never know they might speed up your system by 10 fold.


Do you need to hire top developers? Talk to Andela to help you scale
Are you looking to accelerate your career as a developer? Andela is currently hiring senior developers.
Apply now.