2 Data structures for Data Science in Python

Santosh J
2 min readJul 21, 2022

--

Photo by Markus Spiske on Unsplash

A plethora of data science tools and programming languages to work with data manipulations. But when it comes to Python, these two data structures solve most of one’s day to day data problems. They are:

  1. List
  2. Dictionary

List: A list is a collection of items. The data stored in it can be heterogeneous(can hold different data types). We can add, delete, modify items in a list, there by greater flexibility to work with. We can add data as a sequence or series to a list as well. We can access a list item using index, which is the position of an item in the list. This makes a list operations faster.

A list is created by assigning square brackets to a variable.

L =[]

Few day to day list operations that a data science person works with:

# initiating list
L = []

L = [1, 12, 34, 56]
a = 'x'
# adding items to a list
L.append(a)
L.append(32)
print(L)

# looping over a list
for i in L:
print(f"List has {i}")

# removing items from a list
L.remove('x')
print(L)

# Searching in a list:
s = 12
if s in L:
print(f"Present")
else:
print(f"Not present")

# count an item in the list
print(L.count(15))

Respective Output for the operations:

[1, 12, 34, 56, 'x', 32]List has 1
List has 12
List has 34
List has 56
List has x
List has 32
[1, 12, 34, 56, 32]Present0

Dictionary: A dictionary is collection of key-value pair items, the data can be heterogeneous. Every key should be unique. We access the values using the keys. We can update, add, delete items from a dictionary.

Few day to day dictionary operations that a data science person works with:

# create a dictionary
D = {}

# sample dictionary
D = {'name': 'rob',
'age': 20,
'sex': 'male',
'height': 172.3
}
# Accessing values
print(D['age'])

# Finding the keys in a dictionary
print(D.keys())

# updating a dictionary
D.update({'education': 'bachelors'})
print(D)

# searching for key and retrieving value in a dictionary
a = 'sex'
if a in D.keys():
print(D[a])
else:
print("key not present")

Respective Output for the operations:

20dict_keys(['name', 'age', 'sex', 'height']){'name': 'rob', 'age': 20, 'sex': 'male', 'height': 172.3, 'education': 'bachelors'}male

If one can be comfortable with these operations, they can perform data analysis, wrangling work better and faster. Please comment any other operations which one uses in their job. Thank You.

Salud!

--

--

Santosh J

Problem solver, with experience in Product Management, Data Science, Strategy & Analytics, Cloud Engineering. I enjoy movies, food and reading.