MSc Data Science Lecture Notes. L3: Data Structures and Algorithms — Part 1 of 2

Ivan Reznikov, PhD
8 min readFeb 5, 2024

--

My name is Dr. Ivan Reznikov.
I’m teaching MSc Data Science at Middlesex University Dubai.
This article is part of the series with my brief lecture notes.

This article will be useful, if you’re starting with data science or refreshing your memory for an interview.

In case you’ve missed it, check out the previous lecture:

· Introduction to Data Structures
· Primitive Data Types
String
Integer
Float
Boolean
· Built-in
List
Tuples
Sets
Dictionaries
Comparing Built-in Data Types
· User-defined
Queue
Stack
Linked Lists
Trees
Graphs
· Bonus: How Hashmaps Work

Introduction to Data Structures

Data structures is a fundamental concept in computer science that allows us to efficiently organize data to access, add, and modify it. This is important and often overlooked, but in Lecture 3 Part 2, where we’ll study algorithms, one will see that these operations are used most often among others.

Understanding various types of data structures, from primitive integers and booleans to more complex structures like trees and graphs, is crucial for solving computational problems effectively. Most often, in Python, data structures are divided into primitive and non-primitive.

Data Structure in Python

Primitive data types are the most basic forms of data, such as integers, floats, booleans, and strings, which are directly supported by the programming language and represent single values. Non-primitive data types are more complex forms of data. They can hold multiple combinations of primitive and non-primitive values, enabling more effective way to aggregate and manage data.

Primitive Data Types

String

Strings are sequences of characters used to store and handle text-based information. They can include letters, digits, symbols, whitespace characters, etc. Strings are immutable in Python, meaning once a string is created, its content cannot be changed. However, new strings can be constructed by manipulating the original.

String indexing and slicing

Integer

Integers are whole numbers that can be positive, negative, or zero. They are used for counting, indexing, and operations that require discrete values. In Python, integers have unlimited precision, meaning they can grow to accommodate any number as long as there’s enough memory.

Float

Floats, or floating-point numbers, represent real numbers with fractional parts. Floats are used for calculations requiring precision beyond whole numbers.

Boolean

Booleans represent truth values of logic and boolean algebra: True or False. In programming, they are used to make decisions and control the execution flow based on certain conditions. They are the basis of comparison and logical operations, conditional statements, and loops.

Built-in

There are several built-in data structures in Python: Lists, Tuples, Sets, and Dictionaries.

Several terms to avoid confusion:

  • ordered = stores the order in which the elements were added
  • sorted = stores the alphanumeric order
  • mutable = can be changed

List

Lists in Python are dynamic “containers” (arrays) that can store items of different types. They are ordered, changeable collections that allow duplicate elements.

my_list = [1, "a", True]

For lists, you can perform slicing, indexing, appending new elements, replacing old ones, and deleting them as well.

List Operations

List is often one of the most used data types in Python. If you’re coming from another programming, you might think that lists and arrays are similar. Yes and no. Python lists are more of a hybrid between an array and a linked list (more about linked lists below).

Python List

Tuples

Tuples are immutable sequences in Python that store a collection of items that are not intended to be changed. Tuples are a bit faster than lists due to their immutability. They are often used for protecting data integrity and ensuring that an object remains constant throughout the program.

my_tuple = (1, "a", True)

Tuples support operations similar to lists, such as indexing and slicing, but lack methods for modification.

Sets

Sets are unordered but sorted collections of unique elements in Python. They are very close to mathematical sets, as they support union, intersection, difference, and symmetric difference eoperations.

my_list = [8, 2, 2, False, "abc", 3, 1, 1, 4, 5.0, 5, 7, 8]
my_set = set(my_list)

>>> {False, 1, 2, 3, 4, 5.0, 7, 8, 'abc'}

Sets are invaluable for a lot of reasons. They are a common solution to remove duplicates or for performing set-theoretic operations mentioned above.

Dictionaries

Dictionaries are Python’s built-in mapping type, designed to store objects in a key-value pair format. They are ordered as of Python 3.7. This structure is ideal for fast lookups, insertions, and deletions, as you can access any element by its key without iterating through the entire structure.

my_dictionary = {
"name" : "Burj Khalifa",
"height_m" : 830,
"completed" : True
}

my_dictionary["name"]
>>> "Burj Khalifa"

my_dictionary.keys()
>>> dict_keys(['name', 'height_m', 'completed'])

my_dictionary.values()
>>> dict_values(['Burj Khalifa', 830, True])

my_dictionary['height_m'] = 829.8
my_dictionary['hashtags'] = ["#burjkhalifa", "#dubai", "#uae", "#dubaimall"]
del my_dictionary['completed']

my_dictionary
>>>{
"name" : "Burj Khalifa",
"height_m" : 829.8,
"hashtags" : ["#burjkhalifa", "#dubai", "#uae", "#dubaimall"]
}

Dictionaries in Python have a hashmap under the hood, enabling fast retrieval. Check out how hashmaps work in the bonus section below.

Comparing Built-in Data Types

Below, I’m providing you with a comparison table of built-in Python Data Structures.

Python Built-in Data Structures

As one can notice, all data types are different and all of them find their application.

User-defined

A lot of times programmers developed their own data structures to organize and store data allowing more efficient data access and manipulation. There are a couple of widely spread advanced data structures: stack, queue, trees, etc.

Queue

Queues is a data structure often used to manage sequential order. It follows a First In, First Out (FIFO) principle, where the first element added is the first one to be removed. Similar to a restaurant or shopping line — whoever is first gets served first. This data structure is used for scheduling jobs in a printer queue or managing tasks in operating systems.

Queue Animation

Python provides different types of queues, such as the deque class from the collections module, which allows for efficient insertion and removal from both ends, and the PriorityQueue class, when elements must be processed according to their priority. Queues are fundamental in breadth-first search algorithms and scenarios requiring task scheduling and event handling.

Stack

Stacks is a data structure that operate on a Last In, First Out (LIFO) principle, meaning the last element added to the stack is the first to be removed. This is similar to signing a pile of papers or washing dishes in the sink — whatever came last and is on top gets done first. This behavior makes stacks ideal for managing tasks in scenarios where you must reverse actions or navigate through previously visited points — consider this as “undo” functionality.

Stack Animation

Stacks play a critical role in algorithmic solutions, particularly in depth-first search algorithms and when parsing expressions or syntax checking in compilers.

Linked Lists

Linked Lists is a type of List, where the nodes contain not only data, but also a reference (or link) to the next node in the sequence, aka pointer.

Linked List

Linked lists makes addition, insertion and deletion much more effective, as these operations do not require shifting elements, like in a standard Array. They are come in helpful if you need to optimize memory allocation or when the size of the data collection dynamically changes.

Trees

Trees are hierarchical data structures, that consist out of nodes, that are connected with edges. Top node is called the root.

Trees are widely used to represent hierarchical relationships, implement database indexing, and facilitate efficient searching and sorting algorithms (more about that in Lecture 3 Part 2). Tree traversals (in-order, pre-order, and post-order) allow for the systematic visitation of all the nodes in the tree.

Tree Examples

Binary trees, where each node has at most two children, and binary search trees (BSTs), a variant of binary trees where the left child contains only nodes with values less than the parent node, and the right child has only nodes with values larger than the parent node, are ubiquitous.

Graphs

Graphs are a collection of nodes (or vertices) connected by edges, representing interconnections among objects in a network, such as social networks, transportation systems, and internet links. Graphs can be either directed or undirected and can include cycles.

Graphs

Bonus: How Hashmaps Work

Hashmaps, also known as hash tables is a data structure that dictionary in Python have under the hood. The beauty of a hashmap is fast data retrieval. They work by applying a hash function to the keys to compute an index where the value is stored, allowing for almost instantaneous access to the corresponding value with a given key.

How Hashmap Works (simplified)

This efficiency in data lookup makes hashmaps an indispensable tool in software development, particularly in scenarios requiring quick searches, insertions, and deletions.

This is the end of Part 1 of Lecture 3. The next part will be dedicated to Algorithms and their time complexity. I’ll try to publish it soon.

Meanwhile, check out my LangChain 101 course:

Clap and follow me, as this motivates me to write new parts and articles :) Plus, you’ll get notified when the new part will be published.

--

--