Reshaping Tables With MultiIndex

Nesrin ÖZCAN
Akkim Akademi
Published in
3 min readOct 20, 2022
Image by Brian Ego from Pixabay

Data manipulation is a significant step of data analysis process. While manipulating data, ‘Multi-level Indexing’ may allow you to take an alternative approach to the data. Using multi index sometimes seriously helps you to take a different aspect to the data, but sometimes it is a must. There are various ways for creating multi index and I will show some of them.

Firstly, I will import titanic dataset from Seaborn library. Using built-in datasets is a practical way of exercising for data analysis. I want to see the situation of “age” and “survived” variables according to “sex” and “class” varibles, so I selected only those columns from the dataset. I will show the first 5 rows with head() function to have a quick insight of the dataset.

import seaborn as sns
df = sns.load_dataset("titanic")
df = df[["sex","class","age","survived"]]
df.head(5)

pandas.DataFrame.set_index()

Now, I will construct multi index with DataFrame.set_index() for “sex” and “class” columns in titanic dataset. set_index() function creates multi index, but it does not group by indexes with aggregation functions. This could be used to handle non-numeric data. If you want to apply aggregation functions on values, group_by() and pivot_table() functions will be useful.

multiindex_set = df.iloc[:10,:].set_index(["sex","class"])
multiindex_set
multiindex_set.index

DataFrame.group_by() & pandas.pivot_table()

Pandas group_by() and pivot_table() functions allow us to apply aggregation functions over values according to indexes. Furthermore we can apply multiple types of aggregations for any column with the help of group_by() and pivot_table() functions.

multiindex_groupby = df.groupby(["sex","class"]).agg({"age":"min","survived":"median"})
multiindex_groupby
multiindex_groupby.index
multiindex_pivottable = pd.pivot_table(df, values=["survived","age"], index=["sex","class"], aggfunc={"age":[min,"max",np.mean],"survived":np.mean})
multiindex_pivottable

Additionally, if we want to filter according to one or more index on a multi indexed dataframe, we can use iloc or loc for selection.

multiindex_pivottable.loc["female","First"]
multiindex_pivottable.iloc[1,2]

There are many other issues about reshaping dataframes as a part of data analysis. I tried to show how to construct multi-indexed dataframes with some simple examples in my post. Thank you for reading.

--

--