Fluent Pandas
Let’s uncover the practical details of Pandas’ Series, DataFrame, and Panel
Note to the Readers: Paying attention to comments in examples would be more helpful than going through the theory itself.
· Series (1D data structure: Column-vector of DataTable)
· DataFrame (2D data structure: Table)
· Panel (3D data structure)
Pandas is a column-oriented data analysis API. It’s a great tool for handling and analyzing input data, and many ML framework support pandas data structures as inputs.
Pandas Data Structures
Refer Intro to Data Structures on Pandas docs.
The primary data structures in pandas are implemented as two classes: DataFrame
and Series
.
Import numpy
and pandas
into your namespace:
import numpy as np
import pandas as pd
import matplotlib as mpl
np.__version__
pd.__version__
mpl.__version__
Series (1D data structure: Column-vector of DataTable)
CREATING SERIES
Series is one-dimensional array having elements with non-unique labels (index), and is capable of holding any data type. The axis labels are collectively referred to as index. The general way to create a Series is to call:
pd.Series(data, index=index)
Here, data
can be an NumPy’s ndarray
, Python’s dict
, or a scalar value (like 5
). The passed index
is a list of axis labels.
Note: pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time.
If data
is list
or ndarray
(preferred way):
If data
is an ndarray
or list
, then index
must be of the same length as data
. If no index is passed, one will be created having values [0, 1, 2, ... len(data) - 1]
.
If data
is a scalar value:
If data
is a scalar value, an index
must be provided. The value will be repeated to match the length of index
.
If data
is dict
:
If data
is a dict
, and - if index
is passed the values in data
corresponding to the labels in the index
will be pulled out, otherwise - an index
will be constructed from the sorted keys of the dict
, if possible
SERIES
IS LIKE NDARRAY
AND DICT
COMBINED
Series
acts very similar to an ndarray
, and is a valid argument to most NumPy functions. However, things like slicing also slice the index. Series can be passed to most NumPy methods expecting an 1D ndarray
.
A key difference between Series
and ndarray
is automatically alignment of the data based on labels during Series
operations. Thus, you can write computations without giving consideration to whether the Series
object involved has some non-unique labels. For example,
The result of an operation between unaligned Series
s will have the union of the indexes involved. If a label is not found in one series or the other, the result will be marked as missing NaN
.
Also note that in the above example, the index 'b'
was duplicated, so s['b']
returns pandas.core.series.Series
.
Series
is also like a fixed-sized dict
on which you can get and set values by index label. If a label is not contained while reading a value, KeyError
exception is raised. Using the get
method, a missing label will return None
or specified default.
SERIES
‘ NAME
ATTRIBUTE
Series
can also have a name
attribute (like DataFrame
can have columns
attribute). This is important as a DataFrame
can be seen as dict
of Series
objects.
The Series
‘ name
will be assigned automatically in many cases, in particular when taking 1D slices of DataFrame
.
For example,
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
d = pd.DataFrame(d)
d
# one two
# 0 1.0 4.0
# 1 2.0 3.0
# 2 3.0 2.0
# 3 4.0 1.0
type(d) #=> pandas.core.frame.DataFrame
d.columns #=> Index(['one', 'two'], dtype='object')
d.index #=> RangeIndex(start=0, stop=4, step=1)s = d['one']
s
# 0 1.0
# 1 2.0
# 2 3.0
# 3 4.0
# Name: one, dtype: float64
type(s) #=> pandas.core.series.Series
s.name #=> 'one'
s.index #=> RangeIndex(start=0, stop=4, step=1)
You can rename a Series with pandas.Series.rename()
method or by just assigning new name to name
attribute.
s = pd.Series(np.random.randn(5), name='something')
id(s) #=> 4331187280
s.name #=> 'something's.name = 'new_name'
id(s) #=> 4331187280
s.name #=> 'new_name's.rename("yet_another_name")
id(s) #=> 4331187280
s.name #=> 'yet_another_name'
COMPLEX TRANSFORMATIONS ON SERIES
USING SERIES.APPLY
NumPy is a popular toolkit for scientific computing. Pandas’ Series
can be used as argument to most NumPy functions.
For complex single-column transformation, you can use Series.apply
. Like Python’s map
function, Series.apply
accepts as an argument a lambda function which is applied to each value.
DataFrame
(2D data structure: Table)
Refer: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
DataFrame
is a 2D labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or a SQL table, or a dict
of Series
objects. Like Series
, DataFrame
accepts many different kinds of inputs.
Along with the data, you can optionally pass index
(row labels) and columns
(column labels) arguments. Note that index
can have non-unique elements (like that of Series
). Similarly, columns
names can also be non-unique.
If you pass an index
and/or columns
, you are guaranteeing the index
and / or columns
of the resulting DataFrame
. Thus, a dict
of Series
plus a specific index will discard all data not matching up to the passed index (similar to passing a dict
as data
to Series
).
If axis labels (index
) are not passed, they will be constructed from the input data based on common sense rules.
CREATING DATAFRAME
From a dict
of ndarray
s/list
s
The ndarray
s/list
s must all be the same length. If an index
is passed, it must clearly also be the same length as that of data ndarray
s. If no index
is passed, then implicit index
will be range(n)
, where n
is the array length.
For example,
From a dict
of Series
(preferred way):
The resultant index
will be the union of the indexes of the various Series
(each Series
may be of a different length and may have different index
). If there are nested dict
s, these will be first converted to Series
. If no columns
are passed, the columns
will be list
of dict
keys. For example,
The row and column labels can be accessed respectively by accessing the index
and columns
attributes.
From a list
of dict
s:
For example,
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]pd.DataFrame(data2)
# a b c
# 0 1 2 NaN
# 1 5 10 20.0pd.DataFrame(data2, index=['first', 'second'])
# a b c
# first 1 2 NaN
# second 5 10 20.0pd.DataFrame(data2, columns=['a', 'b'])
# a b
# 0 1 2
# 1 5 10
From a Series
:
The result will be a DataFrame
with the same index
as the input Series
, and with one column whose name is the original name of the Series
(only if no other column name provided).
For example,
s = pd.Series([1., 2., 3.], index=['a', 'b', 'c'])
type(s) #=> pandas.core.series.Seriesdf2 = pd.DataFrame(s)
df2
# 0
# a 1.0
# b 2.0
# c 3.0type(df2) #=> pandas.core.frame.DataFrame
df2.columns #=> RangeIndex(start=0, stop=1, step=1)
df2.index #=> Index(['a', 'b', 'c'], dtype='object')
From a Flat File
The pandas.read_csv
(preferred way):
You can read CSV files into a DataFrame
using pandas.read_csv()
method. Refer to the official docs for its signature.
For example,
CONSOLE DISPLAY AND SUMMARY
Some helpful methods and attributes:
Wide DataFrames will be printed (print
) across multiple rows by default. You can change how much to print on a single row by setting display.width
option. You can adjust the max width of the individual columns by setting display.max_colwidth
.
pd.set_option('display.width', 40) # default is 80
pd.set_option('display.max_colwidth', 30)
You can also display display.max_colwidth
feature via the expand_frame_repr
option. This will print the table in one block.
INDEXING ROWS AND SELECTING COLUMNS
The basics of DataFrame
indexing and selecting are as follows:
For example,
d = {
'one' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'a']),
'two' : pd.Series(['A', 'B', 'C', 'D'], index=['a', 'b', 'c', 'a'])
}df = pd.DataFrame(d)
df
# one two
# a 1.0 A
# b 2.0 B
# c 3.0 C
# a 4.0 Dtype(df['one']) #=> pandas.core.series.Series
df['one']
# a 1.0
# b 2.0
# c 3.0
# a 4.0
# Name: one, dtype: float64type(df[['one']]) #=> pandas.core.frame.DataFrame
df[['one']]
# one
# a 1.0
# b 2.0
# c 3.0
# a 4.0type(df[['one', 'two']]) #=> pandas.core.frame.DataFrame
df[['one', 'two']]
# one two
# a 1.0 A
# b 2.0 B
# c 3.0 C
# a 4.0 Dtype(df.loc['a']) #=> pandas.core.frame.DataFrame
df.loc['a']
# one two
# a 1.0 A
# a 4.0 Dtype(df.loc['b']) #=> pandas.core.series.Series
df.loc['b']
# one 2
# two B
# Name: b, dtype: objecttype(df.loc[['a', 'c']]) #=> pandas.core.frame.DataFrame
df.loc[['a', 'c']]
# one two
# a 1.0 A
# a 4.0 D
# c 3.0 Ctype(df.iloc[0]) #=> pandas.core.series.Series
df.iloc[0]
# one 1
# two A
# Name: a, dtype: objectdf.iloc[1:3]
# one two
# b 2.0 B
# c 3.0 Cdf.iloc[[1, 2]]
# one two
# b 2.0 B
# c 3.0 Cdf.iloc[[1, 0, 1, 0]]
# one two
# b 2.0 B
# a 1.0 A
# b 2.0 B
# a 1.0 Adf.iloc[[True, False, True, False]]
# one two
# a 1.0 A
# c 3.0 C
COLUMN ADDITION AND DELETION
You can treat a DataFrame
semantically like a dict
of like-indexed Series
objects. Getting, setting, and deleting columns works with the same syntax as the analogous dict
operations.
When inserting a Series
that doesn’t have the same index
as the DataFrame
, it will be conformed to the DataFrame
‘s index
(that is, only values with index
matching DataFrame
‘s existing index
will be added, and missing index
will get NaN
(of the same dtype
as dtype
of that particular column) as value.
When inserting a columns with scalar value, it will naturally be propagated to fill the column.
When you insert a same length (as that of DataFrame
to which it is inserted) ndarray
or list
, it just uses existing index of the DataFrame
. But, try not to use ndarrays
or list
directly with DataFrame
s, intead you can first convert them to Series
as follows:
df['yet_another_col'] = array_of_same_length_as_df# is same asdf['yet_another_col'] = pd.Series(array_of_same_length_as_df, index=df.index)
For example,
By default, columns get inserted at the end. The insert()
method is available to insert at a particular location in the columns.
Columns can be deleted using del
, like keys of dict.
DATA ALIGNMENT AND ARITHMETICArithmetics between DataFrame
objects
Data between DataFrame
objects automatically align on both the columns and the index (row labels). Again, the resulting object will have the union of the column and row labels. For example,
Important: You might like to try above example with duplicate columns names and index values in each individual data frame.
Boolean operators (for example, df1 & df2
) work as well.
Arithmetics between DataFrame
and Series
:
When doing an operation between DataFrame
and Series
, the default behavior is to broadcast Series
row-wise to match rows in DataFrame
and then arithmetics is performed. For example,
In the special case of working with time series data, and the DataFrame
index also contains dates, the broadcasting will be column-wise. For example,
Here pd.date_range()
is used to create fixed frequency DatetimeIndex, which is then used as index
(rather than default index of 0, 1, 2, ...
) for a DataFrame
.
For explicit control over the matching and broadcasting behavior, see the section on flexible binary operations.
Arithmetics between DataFrame
and Scalars
Operations with scalars are just as you would expect: broadcasted to each cell (that is, to all columns and rows).
DATAFRAME
METHODS AND FUNCTIONS
Evaluating string describing operations using eval()
method
Note: Rather use assign()
method.
The eval()
evaluates a string describing operations on DataFrame
columns. It operates on columns only, not specific rows or elements. This allows eval()
to run arbitrary code, which can make you vulnerable to code injection if you pass user input into this function.
df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})df
# A B
# 0 1 10
# 1 2 8
# 2 3 6
# 3 4 4
# 4 5 2df.eval('2*A + B')
# 0 12
# 1 12
# 2 12
# 3 12
# 4 12
# dtype: int64
Assignment is allowed though by default the original DataFrame
is not modified. Use inplace=True
to modify the original DataFrame. For example,
df.eval('C = A + 2*B', inplace=True)df
# A B C
# 0 1 10 21
# 1 2 8 18
# 2 3 6 15
# 3 4 4 12
# 4 5 2 9
Assigning new columns to the copies in method chains — assign()
method
Inspired by dplyer‘s mutate
verb, DataFrame
has an assign()
method that allows you to easily create new columns that are potentially derived from existing columns.
The assign()
method always returns a copy of data, leaving the original DataFrame
untouched.
Note: Also check pipe()
method.
df2 = df.assign(one_ratio = df['one']/df['out_of'])df2
# one two one_trunc out_of const one_ratio
# a 1.0 1.0 1.0 100 1 0.01
# b 2.0 2.0 2.0 100 1 0.02
# c 3.0 3.0 NaN 100 1 0.03
# d NaN 4.0 NaN 100 1 NaNid(df) #=> 4436438040
id(df2) #=> 4566906360
Above was an example of inserting a precomputed value. We can also pass in a function of one argument to be evaluated on the DataFrame
being assigned to.
df3 = df.assign(one_ratio = lambda x: (x['one']/x['out_of']))
df3
# one two one_trunc out_of const one_ratio
# a 1.0 1.0 1.0 100 1 0.01
# b 2.0 2.0 2.0 100 1 0.02
# c 3.0 3.0 NaN 100 1 0.03
# d NaN 4.0 NaN 100 1 NaNid(df) #=> 4436438040
id(df3) #=> 4514692848
This way you can remove a dependency by not having to use name of the DataFrame
.
Appending rows with append()
method
The append()
method appends rows of other_data_frame
DataFrame
to the end of current DataFrame
, returning a new object. The columns not in the current DataFrame
are added as new columns.
Its most useful syntax is:
<data_frame>.append(other_data_frame, ignore_index=False)
Here,
other_data_frame
: Data to be appended in the form ofDataFrame
orSeries
/dict
-like object, or alist
of these.ignore_index
: By default it isFalse
. If it isTrue
, then index labels ofother_data_frame
are ignored
Note: Also check concat()
function.
For example,
df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df
# A B
# 0 1 2
# 1 3 4
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df2
# A B
# 0 5 6
# 1 7 8
df.append(df2)
# A B
# 0 1 2
# 1 3 4
# 0 5 6
# 1 7 8
df.append(df2, ignore_index=True)
# A B
# 0 1 2
# 1 3 4
# 2 5 6
# 3 7 8
The drop()
method
Note: Rather use del
as stated in Column Addition and Deletion section, and indexing + re-assignment for keeping specific rows.
The drop()
function removes rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level.
The values
attribute and copy()
method
The values
attribute
The values
attribute returns NumPy representation of a DataFrame
‘s data. Only the values in the DataFrame
will be returned, the axes labels will be removed. A DataFrame
with mixed type columns (e.g. str/object, int64, float32) results in an ndarray
of the broadest type that accommodates these mixed types.
Check Console Display section for an example.
The copy()
method
The copy()
method makes a copy of the DataFrame
object’s indices and data, as by default deep
is True
. So, modifications to the data or indices of the copy will not be reflected in the original object.
If deep=False
, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vica versa).
Its syntax is:
df.copy(deep=True)
Transposing using T
attribute or transpose()
method
Refer section Arithmetic, matrix multiplication, and comparison operations.
To transpose, you can call the method transpose()
, or you can the attribute T
which is accessor to transpose()
method.
The result is a DataFrame
as a reflection of original DataFrame
over its main diagonal by writing rows as columns and vice-versa. Transposing a DataFrame
with mixed dtypes will result in a homogeneous DataFrame
with the object dtype. In such a case, a copy of the data is always made.
For example,
Sorting (sort_values()
, sort_index()
), Grouping (groupby()
), and Filtering (filter()
)
The sort_values()
method
Dataframe can be sorted by a column (or by multiple columns) using sort_values()
method.
For example,
The sort_index()
method
The sort_index()
method can be used to sort by index
.
For example,
The groupby()
method
The groupby()
method is used to group by a function, label, or a list of labels.
For example,
The filter()
method
The filter()
method returns subset of rows or columns of DataFrame
according to labels in the specified index. Note that this method does not filter a DataFrame
on its contents, the filter is applied to the labels of the index, or to the column names.
You can use items
, like
and regex
parameters, but note that they are enforced to be mutually exclusive. The parameter axis
default to the info axis that is used when indexing with []
.
For example,
Melting and Pivoting using melt()
and pivot()
methods
The idea of melt()
is to keep keep few given columns as id-columns and convert rest of the columns (called variable-columns) into variable and value, where the variable tells you the original columns name and value is corresponding value in original column.
If there are n
variable-columns which are melted, then information from each row from the original formation is not spread to n
columns.
The idea of pivot()
is to do just the reverse.
For example,
Piping (chaining) Functions using pipe()
method
Suppose you want to apply a function to a DataFrame
, Series
or a groupby
object, to its output then apply other, other, … functions. One way would be to perform this operation in a “sandwich” like fashion:
Note: Also check assign()
method.
df = foo3(foo2(foo1(df, arg1=1), arg2=2), arg3=3)
In the long run, this notation becomes fairly messy. What you want to do here is to use pipe()
. Pipe can be though of as a function chaining. This is how you would perform the same task as before with pipe()
:
df.pipe(foo1, arg1=1).
pipe(foo2, arg2=2).
pipe(foo3, arg3=3)
This way is a cleaner way that helps keep track the order in which the functions and its corresponding arguments are applied.
Rolling Windows using rolling()
method
Use DataFrame.rolling()
for rolling window calculation.
Other DataFrame
Methods
Refer Methods section in pd.DataFrame
.
Refer Computational Tools User Guide.
Refer the categorical listing at Pandas API.
APPLYING FUNCTIONSThe apply()
method: apply on columns/rows
The apply()
method applies the given function along an axis (by default on columns) of the DataFrame
.
Its most useful form is:
df.apply(func, axis=0, args, **kwds)
Here:
func
: The function to apply to each column or row. Note that it can be a element-wise function (in which caseaxis=0
oraxis=1
doesn’t make any difference) or an aggregate function.axis
: Its value can be0
(default, column) or1
.0
means applying function to each column, and1
means applying function to each row. Note that thisaxis
is similar to how axis are defined in NumpPy, as for 2D ndarray,0
means column.args
: It is atuple
and represents the positional arguments to pass tofunc
in addition to the array/series.**kwds
: It represents the additional keyword arguments to pass as keywords arguments tofunc
.
It returns a Series
or a DataFrame
.
For example,
The applymap()
method: apply element-wise
The applymap()
applies the given function element-wise. So, the given func
must accept and return a scalar to every element of a DataFrame
.
Its general syntax is:
df.applymap(func)
For example,
When you need to apply a function element-wise, you might like to check first if there is a vectorized version available. Note that a vectorized version of func
often exist, which will be much faster. You could square each number element-wise using df.applymap(lambda x: x**2)
, but the vectorized version df**2
is better.
WORKING WITH MISSING DATA
Refer SciKit-Learn’s Data Cleaning section.
Refer Missing Data Guide and API Reference for Missing Data Handling: dropna
, fillna
, replace
, interpolate
.
Also check Data Cleaning section of The tf.feature_column
API on other options.
Also go through https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/
NORMALIZING DATA
One way is to perform df / df.iloc[0]
, which is particular useful while analyzing stock price over a period of time for multiple companies.
THE CONCAT()
FUNCTION
The concat()
function performs concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axis.
The default axis of concatenation is axis=0
, but you can choose to concatenate data frames sideways by choosing axis=1
.
Note: Also check append()
method.
For example,
df1 = pd.DataFrame(
{
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3]
)df2 = pd.DataFrame(
{
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7]
)df3 = pd.DataFrame(
{
'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']
}, index=[1, 2, 3, 4]
)frames = [df1, df2, df3]df4 = pd.concat(frames)df4
# A B C D
# 0 A0 B0 C0 D0
# 1 A1 B1 C1 D1
# 2 A2 B2 C2 D2
# 3 A3 B3 C3 D3
# 4 A4 B4 C4 D4
# 5 A5 B5 C5 D5
# 6 A6 B6 C6 D6
# 7 A7 B7 C7 D7
# 1 A8 B8 C8 D8
# 2 A9 B9 C9 D9
# 3 A10 B10 C10 D10
# 4 A11 B11 C11 D11df5 = pd.concat(frames, ignore_index=True)df5
# A B C D
# 0 A0 B0 C0 D0
# 1 A1 B1 C1 D1
# 2 A2 B2 C2 D2
# 3 A3 B3 C3 D3
# 4 A4 B4 C4 D4
# 5 A5 B5 C5 D5
# 6 A6 B6 C6 D6
# 7 A7 B7 C7 D7
# 8 A8 B8 C8 D8
# 9 A9 B9 C9 D9
# 10 A10 B10 C10 D10
# 11 A11 B11 C11 D11df5 = pd.concat(frames, keys=['s1', 's2', 's3'])df5
# A B C D
# s1 0 A0 B0 C0 D0
# 1 A1 B1 C1 D1
# 2 A2 B2 C2 D2
# 3 A3 B3 C3 D3
# s2 4 A4 B4 C4 D4
# 5 A5 B5 C5 D5
# 6 A6 B6 C6 D6
# 7 A7 B7 C7 D7
# s3 1 A8 B8 C8 D8
# 2 A9 B9 C9 D9
# 3 A10 B10 C10 D10
# 4 A11 B11 C11 D11df5.index
# MultiIndex(levels=[['s1', 's2', 's3'], [0, 1, 2, 3, 4, 5, 6, 7]],
# labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], [0, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4]])
Like its sibling function on ndarrays, numpy.concatenate()
, the pandas.concat()
takes a list or dict of homogeneously-typed objects and concatenates them with some configurable handling of “what to do with other axes”.
MERGING AND JOINING USING MERGE()
AND JOIN()
FUNCTIONS
Refer Mergem Join, and Concatenate official guide.
The function merge()
merges DataFrame
object by performing a database-style join operation by columns or indexes.
The function join()
joins columns with other DataFrame
either on index or on a key column.
BINARY DUMMY VARIABLES FOR CATEGORICAL VARIABLES USING GET_DUMMIES()
FUNCTION
To convert a categorical variable into a “dummy” DataFrame
can be done using get_dummies()
:
df = pd.DataFrame({'char': list('bbacab'), 'data1': range(6)})df
# char data1
# 0 b 0
# 1 b 1
# 2 a 2
# 3 c 3
# 4 a 4
# 5 b 5dummies = pd.get_dummies(df['char'], prefix='key')
dummies
# key_a key_b key_c
# 0 0 1 0
# 1 0 1 0
# 2 1 0 0
# 3 0 0 1
# 4 1 0 0
# 5 0 1 0
PLOTTING DATAFRAME
USING PLOT()
FUNCTION
The plot()
function makes plots of DataFrame
using matplotlib/pylab.
Panel (3D data structure)
Panel is a container for 3D data. The term panel data is derived from econometrics and is partially responsible for the name: pan(el)-da(ta)-s.
The 3D structure of a Panel
is much less common for many types of data analysis, than the 1D of the Series
or the 2D of the DataFrame
. Oftentimes, one can simply use a Multi-index DataFrame
for easily working with higher dimensional data. Refer Deprecate Panel.
Here are some related interesting stories that you might find helpful: