Dataframe in Python

Introduction

sonia jessica
Geek Culture
15 min readOct 5, 2021

--

The Python programming language developed by Guido Van Rossum in the 1990s is ranked as the third most popular programming language in the Stack Overflow Developer Survey 2021. Numpy and Pandas packages of Python are the preferred choices of 33.84% and 28.12% of developers globally and are primarily used in Data Science and Machine Learning. There is so much you can do with Python and its frameworks. This blog introduces DataFrame in Python along with various examples and programs. DataFrames are the primary Data Structure of the Pandas package. It provides fast, flexible, and expressive data structures designed to make working with relational or labeled data easier; the two primary data structures of pandas, Series(1-dimensional) or simply an array of objects and DataFrame(2-dimensional), are frequently used in the fields of Finance and Statistics.

What is a DataFrame in Python?

Source: Pandas Documentation

A DataFrame is defined as two-dimensional, size-mutable, potentially heterogeneous tabular data. (Mutable means columns can be inserted and deleted from DataFrame, Potentially heterogeneous data means columns may contain data of different data types). DataFrame is the Pandas Data Structure. They are a container for Series, and insertion and deletion of objects from these containers are done in a dictionary-like fashion.

The constructor for creating of a DataFrame is:
DataFrame([data, index, columns, dtype, copy])

The parameters of the constructor are as follows:

The three main components of Pandas DataFrame are the data, the index, and the columns. The Data can be a Pandas Series, a Pandas DataFramem, a Numpy ndarray, a two-dimensional ndarray, dictionaries of one-dimensional ndarray, lists, dictionaries or Series. Basic operations on a DataFrame like Creation, Insertion, Deletion, Renaming, and Sorting of DataFrame are discussed in this blog. In real-world projects, a DataFrame will be created by loading data from a database, CSV file, or an excel file. As a starting point lets see how DataFrame can be created using different ways apart from directly importing tabular data:

Creation of DataFrame

In real-world projects, a DataFrame will be created by loading data from a database, CSV file, or an excel file. As a starting point lets see how DataFrame can be created using different ways apart from directly importing tabular data:

  • Creating a DataFrame using List
    A DataFrame can be created using a single list or a list of lists.

Example 1

This example uses a single list for the creation of a DataFrame.

The above code when executed will generate a Pandas DataFrame as shown below

Explanation

In the above example, a Python list is created. The list is then converted to a DataFrame, using the DataFrame() constructor of pandas Package. The DataFrame is then printed. Note that as we didn’t specify the column name and indices for the row, Python does it by default and gives zero-based indexing to the various rows of the DataFrame.
You can also specify index and column names by doing a minor change in the code above.

Example 2

The above code when executed will generate a Pandas DataFrame as shown below:

Explanation:

In the above example, a Python List, listOne, is created. The list is then converted to a Pandas DataFrame using the DataFrame() constructor. In the above example the index=[‘i1’,’i2',’i3'] is used to specify the row indices and the columns=[‘Names’] is used to specify the column name. The DataFrame is then printed. Note that the DataFrame has now the column indices and the row indices as we specified while calling the DataFrame constructor.

Example 3

This example uses a multi-dimensional list with column names and index specified.

Its output is as follows:

Explanation:

In the above example, a list of lists is used. Each list is then treated as an individual row of the DataFrame. The column names, columns=[‘Name’, ‘Score’], and the indices, index=[‘a’,’b’,’c’], are also used in the above example.

Example 4

This example uses a multi-dimensional list with column name, index and dtype specified.

Its output is as follows:

Notice that the dtype parameter changes the type of Score column to floating-point.

Explanation:

In the above example, a list of lists is used, the row indices and column names are also specified when calling the constructor. In addition a new parameter, dtype=float,is also used. This specified that the Data type of the score is float.

  • Creating a DataFrame from dictionary of lists

Dictionaries are used to store data in Key, Value pairs. In the examples below, key will be the column names and the values will be the data inside those columns.

Example 1

The output of the above code is

Explanation

In the above example, a dictionary of lists is created. Note that the values corresponding to the keys are in list format. The dictionary of lists is then converted to a DataFrame using the, DataFrame() constructor. The DataFrame is then printed.

Example 2:

The above example was quite straight-forward. Let’s consider another example wherein we will be given 3 lists, we will convert those lists to a dictionary and then the dictionary to a DataFrame.

The output of the above code is:

Explanation:

In the above example, firstly three lists are created, name, age, and qualifications respectively. A dictionary is then created with the Keys, ‘Name’, ‘Age’ and ‘Qualifications’. The values corresponding to these keys are the lists, name, age, and qualifications respectively. A dictionary of lists is thus created. This is then converted to a DataFrame using the DataFrame() constructor and printed. Note that each key of the dictionary is a column in a DataFrame. The row indexing is zero-based.

  • Creating a DataFrame from List of Dictionaries

A list of dictionaries can be passed as input data to create a DataFrame. By default, the dictionary keys will be taken as the column names.

Example 1

Its output is as follows:

Explanation:

In the above example, a list of dictionaries is created. The key-value pair is separated using the: in each dictionary. The list of dictionaries is then converted to a DataFrame using the DataFrame() constructor.
Note that on conversion to a DataFrame, each key becomes the column name and the values corresponding to it are the rows of the DataFrame. Also, it is worth noticing that in the first dictionary, {‘a’: 1, ‘b’: 2}, unlike the second dictionary, {‘a’: 3, ‘b’ : 4, ‘c’ : 7}, there are only two key-value pairs. So in the resultant DataFrame, the value corresponding to the third column is treated as NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data

It’s always recommended to specify the row indices. The row indices and column indices are very handy in other manipulation operations.

Example 2

The below example shows how to create a DataFrame with a list of dictionaries, row indices, and column indices.

Its output is as:

Explanation:

In the above example, a list of dictionaries is created. The list of dictionaries is then converted to a DataFrame using the DataFrame() constructor. We have also specified the row indices using the, index=[‘first’,’second’], and the column names using the columns=[‘a’,’b’,’c’]. Note that on conversion to a DataFrame, each key becomes the column name and the values corresponding to it are the rows of the DataFrame. The DataFrame is then printed, note that now the column names and the row-indices are the one that we specified and not the default zero-based ones.

Fundamental DataFrame Operations

  • Dimensions of DataFrame
    The shape() method is used to get the height and width of DataFrame.
  • Head of DataFrame
    The head() method is used to get the first five rows of the DataFrame and the tail() method is used to get the last five rows.
  • Locate Row
    The loc attribute is used to return one or more specified rows.
  • Get the Column Names
    You can get the list of column names by using columns in a DataFrame object.
  • Get the Data types of all the columns
    To fetch the Data Type of each column in DataFrame use dtypes.

There are many other fundamental DataFrame operations as well. The below program illustrates the fundamental operations discussed above.

The output of the above program is:

Explanation

In the above example, the process of creation of a DataFrame is the same as the one discussed above in the example 2 of creating a DataFrame from a dictionary of lists. The sequence of output is explained below:

  • Firstly the DataFrame is printed, then its shape i.e. the number of rows and columns in a tuple.
  • The head() of the DataFrame is the first five rows of the DataFrame.
  • The tail() of the DataFrame is the last five rows of the DataFrame.
  • The df.loc[2] is used for printing the third row of the DataFrame. Note that we specify 2 in the loc[] because by default zero-based indexing is used unless explicitly specified.
  • The df.columns are used to print the column labels of the DataFrame.
  • The df.dtypes is used to print the Data types of all the columns in the DataFrame.

Operations on Columns in a DataFrame

In an existing DataFrame, we can rename column names, add columns, and delete columns very easily. All the three basic operations on columns are explained below:

Addition of Columns

To add a new column to a DataFrame, create a Series and assign it as a new column to the original DataFrame. The following example demonstrates this:

The output of the above program is:

Explanation:

In the above example, firstly a DataFrame is created using a list of lists, the row indices and column names are specified along with the Data types in the DataFrame() constructor. The DataFrame() is then printed.

A pandas Series, named as series is created. This pandas series is assigned as a new column to the DataFrame. Then the modified DataFrame is printed.

Renaming of Columns

We can change the name of a single column as well as multiple columns using the .rename() method of Pandas. The following example illustrates the renaming of single and multiple columns

The output is as follows:

Explanation:

In the above example, firstly a DataFrame is created using the Dictionary of Lists. The original DataFrame is then printed.

The df.rename() method is used for renaming column names, by passing the original and modified column names as key:value pairs of a dictionary, columns={‘Name’:’FirstName’}. The inplace=True, is used to specify that the data is modified in place, which means it will return nothing and the original DataFrame is now modified. The modified DataFrame is then printed.

Again the df.rename() method is used.This time two column names will be renamed. Both the original and modified column names are passed as key:value pairs in the dictionary, columns={‘FirstName’:’fName’, ‘Age’:’Years’}. The modified DataFrame is then printed.

The above method is quite handy when renaming a single column or few columns. However, in the case of large datasets containing 100s of columns, specifying the old and new names becomes tedious. In such cases, renaming can be done by assigning a list of new column names.

The output is as:

Explanation:

In the above example, a DataFrame is created using a dictionary of lists. The print(df.columns) is used to print the column names of the DataFrame.

The column names are modified by passing a list of new column names, df.columns = [‘FName’, ‘Years’]. The modified column names are then printed.

Deletion of Columns

A column can be deleted using the del and pop() function. The following example illustrates the deletion of columns.

Its output is as follows:

Explanation:

In the above example, first, the DataFrame is created using a Dictionary of lists. The original DataFrame is printed.
A single column is then deleted using the del df[‘Name’], by passing the column name. Again the DataFrame is printed.
Another column is then deleted using df.pop[‘Age’] by passing the column name. Again the DataFrame is printed.

Apart from the syntax difference, between the del and pop methods, another difference is pop returns the deleted value from the list and del does not return anything.

Operation on rows in a DataFrame

In an existing DataFrame, we can add and delete a row very easily. All the three basic operations are discussed below:

Insertion of Rows

We can add a new row to the dataframe using the append function. This function is used to insert the rows at the end. This is illustrated in the below example:

The output is as follows:

Explanation

In the above example, a DataFrame, df1, is created using a list of lists, the column names are also specified. The original DataFrame is then printed.

A new DataFrame, df2, is created in a similar manner. The Append considers the calling dataframe as the main object and adds rows to that dataframe from the data frames that are passed to the function as argument df1.append() is used to insert df2 inside df1. The modified DataFrame is then printed.

In addition to the append() method, concat() method can also be used for insertion/addition of rows in a DataFrame. This is shown in the example below:

The output is as follows:

Explanation

In the above example, first, a DataFrame, df1, is created using a list of lists. The column names are also specified. The original DataFrame is then printed. Another DataFrame, df2, is created in a similar manner.
The concat() method is used to Concatenate pandas objects along a particular axis. The two dataframes, df1 and df2 are concatenated using the concat() method. The modified DataFrame is then printed.

Deletion of Rows

Rows can be deleted from a DataFrame using index labels, the rows corresponding to those labels are dropped using the .drop() method. This is illustrated in the below example

The output is as follows:

Explanation

In the above example, first, the DataFrame is created using the dictionary of lists, the DataFrame is then printed.
The df.drop() method is used to delete a row based on the row number, which in this example is the default zero-based indexing. The modified DataFrame is then printed.

Conclusion

This article discusses the dataframe in python, its implementation, and various operations on it with examples. It is highly recommended to study these operations and practically implement them on your own. Learn More.

Explore other operations as well. The more you explore, the more knowledge you gain!!

If you are confused about what type of questions related to Pandas are asked in an interview, you must refer to Structure Wise Interview Questions List. They are curated by experts and cover almost all the important concepts which are asked in an interview.

--

--