23 Important Functions in Pandas

Praveena S
featurepreneur
Published in
6 min readJul 16, 2021

What are pandas used for in Python?

pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

1. pd.DataFrame()

Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

import pandas as pd
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
'B': {0: 1, 1: 3, 2: 5},
'C': {0: 2, 1: 4, 2: 6}})
df

Output:

2. pd.melt()

One way to do this in Python is with Pandas Melt. Pd. melt allows you to ‘unpivot’ data from a ‘wide format’ into a ‘long format’, perfect for my task taking ‘wide format’ economic data with each column representing a year, and turning it into ‘long format’ data with each row representing a data point.

pd.melt(df, id_vars=['A'], value_vars=['B'])

Output:

3. pd.cut()

Pandas cut() function is used to separate the array elements into different bins. The cut function is mainly used to perform statistical analysis on scalar data.

pd.cut([0, 1, 1, 2], bins=4, labels=False)

Output:

array([0, 1, 1, 3])

4. pd.qcut()

The pandas documentation describes qcut as a “Quantile-based discretization function.” This basically means that qcut tries to divide up the underlying data into equal-sized bins.

pd.qcut(range(5), 4, labels=False)

Output:

array([0, 0, 1, 2, 3])

5. pd.Series()

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet.

6. pd.concat()

pandas. concat() function does all the heavy lifting of performing concatenation operations along with an axis od Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

s1 = pd.Series(['a', 'b'])
s2 = pd.Series(['c', 'd'])
pd.concat([s1, s2])

Output:

0    a
1 b
0 c
1 d
dtype: object

8. pd.get_dummies()

pd. get_dummies when applied to a column of categories where we have one category per observation will produce a new column (variable) for each unique categorical value. It will place a one in the column corresponding to the categorical value present for that observation.

s = pd.Series(list('abca'))
pd.get_dummies(s)

Output:

9. pd.factorize()

factorize() method helps to get the numeric representation of an array by identifying distinct values. This method is available as both pandas. factorize() and Series.

codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
codes

Output:

array([0, 0, 1, 2, 0])

10. pd.unique()

The unique() function is used to get unique values of the Series object. Uniques are returned in order of appearance. Hash table-based unique, therefore does NOT sort.

pd.unique(pd.Series([2, 1, 3, 3]))

Output:

array([2, 1, 3])

11. pd.isna()

The isna() function is used to detect missing values for an array-like object. This function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).

pd.isna('dog')

Output:

False

pd.isna()

pd.isna(pd.NA)

Output:

True

12. pd.notna()

The notna() function is used to detect non-missing values for an array-like object. This function takes a scalar or array-like object and indicates whether values are valid (not missing, which is NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).

pd.notna('dog')

Output:

True

pd.notna()

pd.notna(pd.NA)

Output:

False

13. pd.Series()

The to_numeric() function is used tp convert the argument to a numeric type. The default return dtype is float64 or int64 depending on the data supplied. Use the downcast parameter to obtain other dtypes.

s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)

Output:

0    1.0
1 2.0
2 -3.0
dtype: float64

14. pd.to_datetime()

The to_datetime() function is used to convert the argument to datetime.

df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]})
pd.to_datetime(df)

Output:

0   2015-02-04
1 2016-03-05
dtype: datetime64[ns]

15. pd.to_timedelta()

The to_timedelta() function is used to convert the argument to datetime.Timedeltas are absolute differences in times, expressed in different units (e.g. days, hours, minutes, seconds). This method converts an argument from a recognized timedelta format/value into a Timedelta type.

pd.to_timedelta('15.5us')

Output:

Timedelta('0 days 00:00:00.000015500')

16. pd.date_range()

The date_range() function is usede to get a fixed frequency DatetimeIndex.

pd.date_range(start='1/1/2021', end='1/08/2021')

Output:

DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
'2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08'],
dtype='datetime64[ns]', freq='D')

17. pd.timedelta_range()

The timedelta_range() function is used to concatenate pandas objects along a particular axis with optional set logic along the other axes.

pd.timedelta_range(start='1 day', periods=4)

Output:

TimedeltaIndex(['1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq='D')

18. pd.interval_range()

The interval_range() function is used to concatenate pandas objects along a particular axis with optional set logic along the other axes.

pd.interval_range(start=0, end=5)

Output:

IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]],
closed='right',
dtype='interval[int64]')

19. pd.Index()

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

pd.Index(["foo", "bar", "baz"]) == "foo"

Output:

array([ True, False, False])

20. pd.Categorical()

Categorical(val, categories = None, ordered = None, dtype = None) : It represents a categorical variable. Categoricals are a pandas data type that corresponds to the categorical variables in statistics. Such variables take on a fixed and limited number of possible values.

21. pd.crosstab()

This method is used to compute a simple cross-tabulation of two (or more) factors. By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
pd.crosstab(foo, bar)

Output:

22. df.pivot()

DataFrame — pivot() function

The pivot() function is used to reshape a given DataFrame organized by given index/column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns.

import pandas as pd

# creating a dataframe
df = pd.DataFrame({'A': ['xxx', 'yyy', 'zzz'],
'B': ['Masters', 'Graduate', 'Graduate'],
'C': [27, 23, 21]})

df
df.pivot('A', 'B', 'C')

Output:

23. pd.read_csv()

Pandas is one of those packages and makes importing and analyzing data much easier. Import Pandas: import pandas as pd. Code #1: read_csv is an important pandas function to read CSV files and do operations on it

import pandas as pd 
data = pd.read_csv("amazon.csv")
data.head()

Output:

Thanks for reading!!

--

--