Boost Your Data Analysis Efficiency: Essential NumPy and Pandas Shortcuts You Should Know

Published in

The Algorithmic Minds

4 min readJul 15, 2023

In this blog, we will learn about shortcuts to Data Analysis with Python using NumPy and Pandas. It’s a keynote for those who have knowledge in Python.

NumPy

• Import numpy as np
• ABC = np.array([1,2,3,1])
• DEF = np.array([[1,2,3,4],[5,4,6,5]])

• Usefull Functions:
• a.shape, a.ndim, a.size, a.dtype, a.itemsize

• Matrix Creation:
• Print(np.array([1,2,3,1]).reshape(2,2))
• Print(np.matrix([[1,2],[3,4]]))
• Print(np.eye(3))
• Print(np.zeros((4,3)))

• Usefull Functions:
• np.sqrt(), np.add(a,b), np.substract(a,b), np.multiply(a,b), np.divide(a,b), 
  np.log(a), np.array_equal(a,b), np.roots(coefficients{array_name})

2. Pandas Series and DataFrame

• Import pandas as pd
• new_d={‘day’:[50,21], ‘Night’:[121,21]}
• pd.DataFrame (new_d)____________#D and F should be capital
• pd.Series([1,2,3,4,5])_____#S should be capital
• pd.Series([30,35,40], index=[‘2015 Sales’. ‘2016 Sales’, ‘2017 Sales’],
  name=’Product A’)______# create a series from a python list 
  # add row index labels and seriss name 
• data= pd.read_csv(‘data_location’)
• data.head(), data.tail(), data.head(10), data.tail(7)
• data
• data.shape______# check the number of rows and columns in the DataFrame 
• data.columns
• data[‘Team’][0]__________
  # check the first entry in "Team" column in 'data' DataFrame

• Indexing in Pandas:
• Iloc()- Index based selection, Loc()- Label based selection
• Data.iloc[0], data.iloc[:,10], data.iloc[2:4,10]
• Data.loc[0,'team1']
• Data.set_index('team1')____________# change the row index to the team1 column 

• Conditional Selction:
• Data.team1.unique()
• Data.team1== ‘India’_________# create a simple boolean filter 
• Data.loc[data.team1== ‘India’]_______
  # use the boolean filter to extract data using `loc`
• | - or . & - and
• Data.loc[(data.team1== ‘India’) | (data.score>=50)]
• Data.loc[(data.team1== ‘India’) & (data.total>=5000)]

3. Pandas Functions and Maps

• Data.info()
• Data.coloumn_name.describe()
• Data.coloumn_name.mean()
• Data.coloumn_name.unique()
• Data.coloumn_name.value_counts()_____
  # get the count of each unique entry of column_name of data DataFrame

• Map() function
• Data_overallqual_mean= data.OverallQual.mean()
• Data.OverallQual.map(lambda curr_val: curr_val – data_overallqual_mean)___
  # map column to set the score with respect to the mean 

• Apply() function
• # define a function to call on the entire row
def remean_lotarea(row):

 # compute the mean of the column
       data_overallqual_mean = data.OverallQual.mean()
    
 # get the re-meaned values 
       row.OverallQual = row.OverallQual - data_overallqual_mean 

  # return the newly computed row
    return row

# use the function to generate a new DataFrame
  data.apply(remean_lotarea, axis='columns')

# check the column 'OverallQual' to see the re-meaned (normalized) values
• Data.Bldgtype + '–' +data.HouseStyle

4.Pandas Grouping and Sorting

• Data.groupby(‘column_name’).column_name.count()
• Data.groupby(‘column_name1’).column_name2.min()
• Data.groupby(‘column_name’).apply(lambda dataframe: dataframe.
  column_name2.iloc[0])
• Data.groupby([‘team1’,'age']). 
  apply(lambda dataframe: dataframe.loc[dataframe.OverallCond.idxmax()])_____
  # data to find the best conditon based on team1 and age built
 
• Data.groupby(‘column_name1’).column_name2.agg([len,min,max])
• Data.groupby(‘column_name’).reset_index()_____________
  # convert multi-index to regular-index

• Data.groupby(‘column_name’).sort_values(by='len')
• Data.groupby(‘column_name’).sort_values(by='len', ascending=false)

5. Pandas Data Types and Missing Values

• Data.column_name.dtype
• Data.index.dtype
• Data.dtypes
• Data.team3.astype(‘float’)__________
  # convert the team3 column to float64 from int64

 
• Data[pd.isnull(data.fence)]______________
  # select all the entries which have null values for Fence
• Data.fence.fillna(‘No value’)_________
  # filling missing fence values with 'no fence'  
• Data.neighborhood.replace(‘noridge’, ‘northridge’)
• Data.rename(columns={‘Neighborhood’ : ‘Locality’})___________
  # change the Neighborhood column name to Locality 
• Data.rename(index={0: ‘firstentry’, 1: ‘secondentry’})
• Data.set_index(‘id’)_____________# set the Id column as row labels
• Data.rename_axis(‘houses’,axis= ‘rows’).rename_axis(‘Details’, 
  axis= ‘columns’)___# set the label for rows as Houses and columns as Details


• Combining:
• Concat()___________________
  # Combine DataFrames with overlapping columns and return only those that are shared
• pd.concat([s1, s2], ignore_index= true)_______# For combining series
• pd.concat([df1, df2], join= ‘inner’, ignore_index= true)________
  #For combining dataframes 


• join ()________
  # join columns with other DataFrame either on index or on a key column
  # efficiently join multiple DataFrame objects by index at once by passing a list join columns with other DataFrame either on index or on a key column

• df.join(other, lsuffix= ‘_caller’, rsuffix= ‘_other’)____________
  # join `df` with `other` using appropriate suffixes on `key` column of each DataFrame 
• df.set_index(‘key’).join(other.set_index(‘key’))_________________
  # to join along the key column, i.e. key as the index of the joined DataFrame 


• Merge()
• Df1.merge(df2, left_on= ‘1key’, right_on= ‘rkey’)_____________________
  # merge df1 and df2 on the lkey and rkey columns 
  # value columns have the default suffixes, _x and _y, appended 

• Df.to_csv(‘output.csv’, index=false)
  # ouptut data frame to save file names `output.csv`

6. Pandas Data Visualization

• Import seaborn as sns
• Data.plot.bar(figsize=(16,9))_____# Regular Bar Plot
• Data.plot.bar(stacked=true. Figsize=(16,9))______# Stacked Bar Plot
• Data.plot.barh(figsize=(16,9))_______# Horizontal Bar Plot
• Data.plot.barh(figsize=(16,9),stacked=true)____# Horizontal Stacked Bar Plot

• Sns.set_palette(‘muted’)
  Data.plot.area(figsize=(16,9))_____# Stacked Area Plot 
• Data.plot.area(figsize=(16,9),stacked=false)_____# Unstacked Area Plot 

• Data.diff()    # To plot diff grph refer github______
  # takes the difference between one row and the row before it (hence the presence of NaN in the first row)



• Data[‘value’].rolling(10).mean().plot(figsize=(16,9))___# Rolling Mean 

• Data.plot.pie(subplots=true, figsize=(8,4));____-# Pie Charts

• Data.plot(subplots=true, figsize=(20,10));_____-# Line Graphs

• Data.plot(subplots=true, layout=(2,2), figsize=(20,10));

Boost Your Data Analysis Efficiency: Essential NumPy and Pandas Shortcuts You Should Know

Written by Dhanush B M