Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. Python with Pandas is used in a wide array of disciplines, including economics, finance, statistics, analytics, and more. In this article, we have listed some essential pandas interview questions and NumPy interview questions that a python learner must know.
Pandas Interview Questions & Answers
Question 1 — Define Python Pandas.
Pandas refer to a software library explicitly written for Python, which is used to analyze and manipulate data. Pandas is an open-source, cross-platform library created by Wes McKinney. It was released in 2008 and provided data structures and operations to manipulate numerical and time-series data. Pandas can be installed using pip or Anaconda distribution. Pandas make it very easy to perform machine learning operations on tabular data.
Question 2 — What Are The Different Types Of Data Structures In Pandas?
Panda library supports two major types of data structures, DataFrames and Series. Both these data structures are built on the top of NumPy. Series is a one dimensional and simplest data structure, while DataFrame is two dimensional. Another axis label known as the “Panel” is a 3-dimensional data structure and includes items such as major_axis and minor_axis.
Question 3 — Explain Series In Pandas.
Series is a one-dimensional array that can hold data values of any type (string, float, integer, python objects, etc.). It is the simplest type of data structure in Pandas; here, the data’s axis labels are called the index.
Question 4 — Define Dataframe In Pandas.
A DataFrame is a 2-dimensional array in which data is aligned in a tabular form with rows and columns. With this structure, you can perform an arithmetic operation on rows and columns.
Question 5 — How Can You Create An Empty Dataframe In Pandas?
To create an empty DataFrame in Pandas, type
import pandas as pd
ab = pd.DataFrame()
Question 6 — What Are The Most Important Features Of The Pandas Library?
Important features of the panda’s library are:
- Data Alignment
- Merge and join
- Memory Efficient
- Time series
- Reshaping
Read: Dataframe in Apache PySpark: Comprehensive Tutorial
Question 7 — How Will You Explain Reindexing In Pandas?
To reindex means to modify the data to match a particular set of labels along a particular axis.
Various operations can be achieved using indexing, such as-
- Insert missing value (NA) markers in label locations where no data for the label existed.
- Reorder the existing set of data to match a new set of labels.
Question 8 — What are the different ways of creating DataFrame in pandas? Explain with examples.
DataFrame can be created using Lists or Dict of nd arrays.
Example 1 — Creating a DataFrame using List
import pandas as pd
# a list of strings
Strlist = [‘Pandas’, ‘NumPy’]
# Calling DataFrame constructor on the list
list = pd.DataFrame(Strlist)
print(list)
Example 2 — Creating a DataFrame using dict of arrays
import pandas as pd
list = {‘ID’: [1001, 1002, 1003],’Department’:[‘Science’, ‘Commerce’, ‘Arts’,]}
list = pd.DataFrame(list)
print (list)
Check out: Data Science Interview Questions
Question 9 — Explain Categorical Data In Pandas?
Categorical data refers to real-time data that can be repetitive; for instance, data values under categories such as country, gender, codes will always be repetitive. Categorical values in pandas can also take only a limited and fixed number of possible values.
Numerical operations cannot be performed on such data. All values of categorical data in pandas are either in categories or np.nan.
This data type can be useful in the following cases:
If a string variable contains only a few different values, converting it into a categorical variable can save some memory.
It is useful as a signal to other Python libraries because this column must be treated as a categorical variable.
A lexical order can be converted to a categorical order to be sorted correctly, like a logical order.
Question 10 — Create A Series Using Dict In Pandas.
import pandas as pd
import numpy as np
ser = {‘a’ : 1, ‘b’ : 2, ‘c’ : 3}
ans = pd.Series(ser)
print (ans)
Question 11 — How To Create A Copy Of The Series In Pandas?
To create a copy of the series in pandas, the following syntax is used:
pandas.Series.copy
Series.copy(deep=True)
* if the value of deep is set to false, it will neither copy data nor the indices.
Question 12 — How Will You Add An Index, Row, Or Column To A Dataframe In Pandas?
To add rows to a DataFrame, we can use .loc (), .iloc () and .ix(). The .loc () is label based, .iloc() is integer based and .ix() is booth label and integer based. To add columns to the DataFrame, we can again use .loc () or .iloc ().
Question 13 — What Method Will You Use To Rename The Index Or Columns Of Pandas Dataframe?
.rename method can be used to rename columns or index values of DataFrame
Question 14 — How Can You Iterate Over Dataframe In Pandas?
To iterate over DataFrame in pandas for loop can be used in combination with an iterrows () call.
Question 15 — What Is Pandas Numpy Array?
Numerical Python (NumPy) is defined as an inbuilt package in python to perform numerical computations and processing of multidimensional and single-dimensional array elements.
NumPy array calculates faster as compared to other Python arrays.
Question 16 — How Can A Dataframe Be Converted To An Excel File?
To convert a single object to an excel file, we can simply specify the target file’s name. However, to convert multiple sheets, we need to create an ExcelWriter object along with the target filename and specify the sheet we wish to export.
Question 17 — What Is Groupby Function In Pandas?
In Pandas, groupby () function allows the programmers to rearrange data by using them on real-world sets. The primary task of the function is to split the data into various groups.
Also Read: Top 15 Python AI & Machine Learning Open Source Projects
Conclusion
We hope the above-mentioned Pandas interview questions and NumPy interview questions will help you prepare for your upcoming interview sessions. If you are looking for courses that can help you get a hold of Python language, upGrad can be the best platform.
If you are curious to learn about data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
This article originally published on upGrad blog.