Boost Your Data Analysis Efficiency: Essential NumPy and Pandas Shortcuts You Should Know

Dhanush B M
The Algorithmic Minds
4 min readJul 15, 2023

In this blog, we will learn about shortcuts to Data Analysis with Python using NumPy and Pandas. It’s a keynote for those who have knowledge in Python.

  1. NumPy
• Import numpy as np
• ABC = np.array([1,2,3,1])
• DEF = np.array([[1,2,3,4],[5,4,6,5]])

• Usefull Functions:
• a.shape, a.ndim, a.size, a.dtype, a.itemsize

• Matrix Creation:
• Print(np.array([1,2,3,1]).reshape(2,2))
• Print(np.matrix([[1,2],[3,4]]))
• Print(np.eye(3))
• Print(np.zeros((4,3)))

• Usefull Functions:
• np.sqrt(), np.add(a,b), np.substract(a,b), np.multiply(a,b), np.divide(a,b),
np.log(a), np.array_equal(a,b), np.roots(coefficients{array_name})

2. Pandas Series and DataFrame

• Import pandas as pd
• new_d={‘day’:[50,21], ‘Night’:[121,21]}
• pd.DataFrame (new_d)____________#D and F should be capital
• pd.Series([1,2,3,4,5])_____#S should be capital
• pd.Series([30,35,40], index=[‘2015 Sales’. ‘2016 Sales’, ‘2017 Sales’],
name=’Product A’)______# create a series from a python list
# add row index labels and seriss name
• data= pd.read_csv(‘data_location’)
• data.head(), data.tail(), data.head(10), data.tail(7)
• data
• data.shape______# check the number of rows and columns in the DataFrame
• data.columns
• data[‘Team’][0]__________
# check the first entry in "Team" column in 'data' DataFrame

• Indexing in Pandas:
• Iloc()- Index based selection, Loc()- Label based selection
• Data.iloc[0], data.iloc[:,10], data.iloc[2:4,10]
• Data.loc[0,'team1']
• Data.set_index('team1')____________# change the row index to the team1 column

• Conditional Selction:
• Data.team1.unique()
• Data.team1== ‘India’_________# create a simple boolean filter
• Data.loc[data.team1== ‘India’]_______
# use the boolean filter to extract data using `loc`
• | - or . & - and
• Data.loc[(data.team1== ‘India’) | (data.score>=50)]
• Data.loc[(data.team1== ‘India’) & (data.total>=5000)]

3. Pandas Functions and Maps

• Data.info()
• Data.coloumn_name.describe()
• Data.coloumn_name.mean()
• Data.coloumn_name.unique()
• Data.coloumn_name.value_counts()_____
# get the count of each unique entry of column_name of data DataFrame

• Map() function
• Data_overallqual_mean= data.OverallQual.mean()
• Data.OverallQual.map(lambda curr_val: curr_val – data_overallqual_mean)___
# map column to set the score with respect to the mean

• Apply() function
• # define a function to call on the entire row
def remean_lotarea(row):

# compute the mean of the column
data_overallqual_mean = data.OverallQual.mean()

# get the re-meaned values
row.OverallQual = row.OverallQual - data_overallqual_mean

# return the newly computed row
return row

# use the function to generate a new DataFrame
data.apply(remean_lotarea, axis='columns')

# check the column 'OverallQual' to see the re-meaned (normalized) values
• Data.Bldgtype + '–' +data.HouseStyle

4.Pandas Grouping and Sorting

• Data.groupby(‘column_name’).column_name.count()
• Data.groupby(‘column_name1’).column_name2.min()
• Data.groupby(‘column_name’).apply(lambda dataframe: dataframe.
column_name2.iloc[0])
• Data.groupby([‘team1’,'age']).
apply(lambda dataframe: dataframe.loc[dataframe.OverallCond.idxmax()])_____
# data to find the best conditon based on team1 and age built

• Data.groupby(‘column_name1’).column_name2.agg([len,min,max])
• Data.groupby(‘column_name’).reset_index()_____________
# convert multi-index to regular-index

• Data.groupby(‘column_name’).sort_values(by='len')
• Data.groupby(‘column_name’).sort_values(by='len', ascending=false)

5. Pandas Data Types and Missing Values

• Data.column_name.dtype
• Data.index.dtype
• Data.dtypes
• Data.team3.astype(‘float’)__________
# convert the team3 column to float64 from int64


• Data[pd.isnull(data.fence)]______________
# select all the entries which have null values for Fence
• Data.fence.fillna(‘No value’)_________
# filling missing fence values with 'no fence'
• Data.neighborhood.replace(‘noridge’, ‘northridge’)
• Data.rename(columns={‘Neighborhood’ : ‘Locality’})___________
# change the Neighborhood column name to Locality
• Data.rename(index={0: ‘firstentry’, 1: ‘secondentry’})
• Data.set_index(‘id’)_____________# set the Id column as row labels
• Data.rename_axis(‘houses’,axis= ‘rows’).rename_axis(‘Details’,
axis= ‘columns’)___# set the label for rows as Houses and columns as Details


• Combining:
• Concat()___________________
# Combine DataFrames with overlapping columns and return only those that are shared
• pd.concat([s1, s2], ignore_index= true)_______# For combining series
• pd.concat([df1, df2], join= ‘inner’, ignore_index= true)________
#For combining dataframes


• join ()________
# join columns with other DataFrame either on index or on a key column
# efficiently join multiple DataFrame objects by index at once by passing a list join columns with other DataFrame either on index or on a key column

• df.join(other, lsuffix= ‘_caller’, rsuffix= ‘_other’)____________
# join `df` with `other` using appropriate suffixes on `key` column of each DataFrame
• df.set_index(‘key’).join(other.set_index(‘key’))_________________
# to join along the key column, i.e. key as the index of the joined DataFrame


• Merge()
• Df1.merge(df2, left_on= ‘1key’, right_on= ‘rkey’)_____________________
# merge df1 and df2 on the lkey and rkey columns
# value columns have the default suffixes, _x and _y, appended

• Df.to_csv(‘output.csv’, index=false)
# ouptut data frame to save file names `output.csv`

6. Pandas Data Visualization

• Import seaborn as sns
• Data.plot.bar(figsize=(16,9))_____# Regular Bar Plot
• Data.plot.bar(stacked=true. Figsize=(16,9))______# Stacked Bar Plot
• Data.plot.barh(figsize=(16,9))_______# Horizontal Bar Plot
• Data.plot.barh(figsize=(16,9),stacked=true)____# Horizontal Stacked Bar Plot

• Sns.set_palette(‘muted’)
Data.plot.area(figsize=(16,9))_____# Stacked Area Plot
• Data.plot.area(figsize=(16,9),stacked=false)_____# Unstacked Area Plot

• Data.diff() # To plot diff grph refer github______
# takes the difference between one row and the row before it (hence the presence of NaN in the first row)



• Data[‘value’].rolling(10).mean().plot(figsize=(16,9))___# Rolling Mean

• Data.plot.pie(subplots=true, figsize=(8,4));____-# Pie Charts

• Data.plot(subplots=true, figsize=(20,10));_____-# Line Graphs

• Data.plot(subplots=true, layout=(2,2), figsize=(20,10));

--

--