Common Excel formulas in Python

Rohit Raj
Analytics Vidhya
Published in
8 min readMar 31, 2020

I have written a medium article on how to automate common data analysis task in Excel. Next I wrote a medium article on how to make a GUI for data analysis in python. In this article I will make a deep dive into how to implement common Excel functions in python.

We have following data in a file ‘medium.xls’.

It contains list of users across different groups and locations. Columns Revenue and Expense contains revenue and expense to the company from each user.

We have a second file ‘group.xls’ as follows

It contains group name for each group number.

Now I will show how to implement common excel functions in python

1 Reading Files

We can read excel files in pandas library using pandas read_excel function. It reads excel file and return file data as pandas dataframe object.

data = pandas.read_excel(r'path to/medium.xls')
group = pandas.read_excel(r'path to/group.xlsx')

2 Examination of Data

Now we want to examine if the data has been read correctly. We can examine the first five rows of dataframe using head method as shown below:

data.head(5)

We can examine last five rows of dataframe using tail method as shown below:

data.tail(5)

If we want to examine rows 2 to rows 7 of dataframe, we can use iloc method of dataframe object as follows:

data.iloc[2:8]

2. Vlookup

Vlookup is probably the most used function in Excel. It is used to join two tables in excel.

This can be accomplished in pandas library using merge function as follows:

data.merge(group, on = 'Group')

For ‘on’ keyword, name of common column name between two dataframe is to be given.

Using the merge keyword, group names from second group dataframe are added into the data dataframe.

Merge function is capable of doing much more. You can read more about it here

https://www.shanelynn.ie/merge-join-dataframes-python-pandas-index-1/

2 Pivot Table

Pivot tables are used in Excel to perfrom aggregation of data. It is one of the most used functions in Excel.

We can create pivot table in pandas library using pivot_table method of dataframe as follows:

data.pivot_table(values='Revenue', index='Location', columns ='Group').fillna(0)

In Excel you select columns for rows, columns and index using GUI. In pandas library you have to provide them using keywords. Pivot_table function returns output as a dataframe which can be further processed.

3 String Function — concatenate, len, trim, left, mid, right

Excel has several string functions. In pandas library you can apply python string methods column wise by first applying str accessor method. Then you can apply all native string functions in python to it.

In Excel you can use upper function to convet text to upper case. In pandas you can convert all text in columns to uppercase as follows:

data['User'].str.upper()

In excel you can convert text to lowercase using LOWER function. In Pandas, we can convert it to lowercase using lower function as follows:

data['User'].str.lower()

In Excel you can get length of string using LEN function. In pandas we can get length of string using len function as follows:

data['User'].str.len()

In excel you can get first characters of string using LEFT function. In pandas, we can get first 2 characters of string using slicing operation on str method as follows:

data['User'].str[:2]

In excel you can get last characters of of string using RIGHT function. In pandas, we can get last 2 characters of string using slicing operation on str method as follows:

data['User'].str[-2:]

Similarly you can use slicing to implement mid function of Excel in pandas.

In excel you can combine two strings using CONCATENATE function. In pandas strings can just be added together to concatenate them together.

data['User'] + data['Location']

4 Sumifs

In Excel, Sumifs is used to sum cells in range which satisfy certain conditions. In pandas sumifs function can be implemented using sum function.

For example if we wanted to know group wise revenue and expense then we can apply sum function to groupby method.

data.groupby('Group').sum()

This will return group wise revenue and expense.

Groupby method allows you to segregate data groupwise based on unique values in column ‘Group’ which is provided as argument to groupby method. It allows to split the data and combine it by aggregation method of your choice. You can read more about groupby here

https://realpython.com/pandas-groupby/

Like sumifs, you can add more conditions. Suppose you wanted group wise revenue and expense for only groups located in New York. We can achieve this by filtering the data before applying groupby, as follows:

data[data['Location']=="Newyork"].groupby('Group').sum()

5 Countifs , Averageifs

COUNTIFS function is used in excel to count cells in range satisfying one or more conditions. AVERAGEIFS is used in excel to average value of cells satisfying one or more conditions.

Groupby methods support also count and mean aggregation methods. When we apply these methods we can obtain results similar to sumifs and countifs.

Suppose we wanted to number of elements in each group. Then we can do it using groupby along with count as follows:

data.groupby('Group')['Revenue'].count()

Supposed we wanted to know average revenue and expense for each group. Then we can do it as follows:

data.groupby('Group').mean()

6 Sum, Max, Min, Average

Sum, Max, Min and Average are essential Excel functions. They returns sum, maximum, minimum and average of values of selected range.

Pandas dataframe and series object both support sum, max, min and mean functions, which can be used to implement above functionality. I have calculated sum, max, min and average of revenue column as shown below:

data['Revenue'].sum()

data['Revenue'].max()
data['Revenue'].min()
data['Revenue'].mean()

9 If function

IF function is used in excel for selecting among values based on a condition. It can be implemented using where function of Numpy library. First we have to import numpy as follows

import numpy

Numpy library contains a where function which is similar in functionality to IF formula in excel. It returns value from two arrays depending on condition. You can read more about it here.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

We can use numpy where function to implement an if formula where based on a condition we want to return one or zero.

numpy.where(data['Revenue']>400,1,0)

We can also compare two columns of dataframe as follows:

numpy.where(data['Revenue']<data['Expense'],1,2)

10 Index and Match

In Excel , combination of Index and match is used an alternative to Vlookup. Index function returns value of cell in a range which as per specified index. Match returns index of cell in a range of cell which matches the specified condition.

Loc accessor method is used in pandas to access values in dataframe.

If you pass only one argument to loc method. It selects rows of dataframe by index label

data.loc[0]

If we want to access by rows and columns. Then we have to provide two arguments to loc methods

data.loc[0,'User']

If we want to access specific column then we can provide ‘:’ as first argument to access all rows.

data.loc[:,"User"]

You can read more about loc method here

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

Loc method also accept boolean array as an argument. So if we pass condition as one of the arguments then it would be equivalent to Index and Match.

Suppose if wanted know expense when revenue was 500. Then we can implement it as follows:

data.loc[data['Revenue']==500,'Expense']

If we wanted to know expense when revenue is equal to expense then we can implement it as follows:

data.loc[data['Revenue']==data['Expense'],'Expense']

13 Sum Product

In Excel SUMPRODUCT function is used to calculate sum of product of two ranges. In pandas it can be implemented as follows:

sum(data['Revenue']*data['Expense'])

This takes advantage of the fact arithmetic operations can done element wise on pandas dataframe as follows:

data['Revenue']*data['Expense']
data['Revenue']+data['Expense']
data['Revenue']/data['Expense']

14 Saving Files

We can save our analysis in csv or excel format. For saving to csv, to_csv function is used. To_excel function is used for saving to excel format. Here I save data object in csv and excel format as shown below:

data.to_csv('data.csv')
data.to_excel('data.xls')

15 Summary

We might have a excel worksheet with many formulas lying around. And we want to implement the logic of worksheet in Python. In such a situation using these python functions we can easily transition from Excel to Python.

--

--

Rohit Raj
Analytics Vidhya

Studied at IIT Madras and IIM Indore. Love Data Science