Reading and writingExcel files in Python pandas

In data science, you are very likely to mostly work with CSV files. However, knowing how to import and export Excel files is also very useful.

In this post, a Kaggle dataset on 2016 US Elections was used (https://www.kaggle.com/benhamner/d/benhamner/2016-us-election/primary-results-sample-data/output). This dataset has been converted from a CSV file to an Excel file and two sheets have been added with votes for Hilary Clinton (HilaryClinton) and Donald Trump (DonaldTrump). The first sheet (All) contains the original dataset.

Reading Excel files

dframe = pd.read_excel(“file_name.xlsx”)

Reading Excel files is very similar to reading CSV files. By default, the first sheet of the Excel file is read.

I’ve read an Excel file and viewed the first 5 rows

dframe = pd.read_excel(“file_name.xlsx”, sheetname=”Sheet_name”)

Passing the sheetname method allows you to read the sheet of the Excel file that you want. It is very handy if you know its name.

I picked the sheet named “DonaldTrump”

dframe = pd.read_excel(“file_name.xlsx”, sheetname=number)

If you aren’t sure what are the names of your sheets, you can pick them by their order. Please note that the sheets start from 0 (similar to indices in pandas), not from 1.

I read the second sheet of the Excel file

dframe = pd.read_excel(“file_name.xlsx”, header=None)

Sometimes, the top row does not contain the column names. In this case, you pass the argument of header=None.

The first row is not the header — instead, we get the column names as numbers

dframe = pd.read_excel(“file_name.xlsx”, header=n)

Passing the argument of header being equal to a number allows us to pick a specific row as the column names.

I pick the second row (i.e. row index 1 of the original dataset) as my column names.

dframe = pd.read_excel(“file_name.xlsx”, index_col=number)

You can use different columns for the row labels by passing the index_col argument as number.

I now use the county as the index column.

dframe = pd.read_excel(“file_name.xlsx”, skiprows=n)

Sometimes, you don’t want to include all of the rows. If you want to skip the first n rows, just pass the argument of skiprows=n.

Skipping the first two rows (including the header)

Writing an Excel file

dframe.to_excel(‘file_name.xlsx’)

I wrote an Excel file called results.xlsx from my results DataFrame
My exported Excel file

dframe.to_excel(‘file_name.xlsx’, index=False)

If you don’t want to include the index name (for example, here it is a number so it may be meaningless for future use/analysis), you can just pass another argument, setting index as False.

I don’t want index names in my Excel file
Excel file output with no index names

All of the code can be found on my GitHub: https://github.com/kasiarachuta/Blog/blob/master/Reading%20and%20writing%20Excel%20files.ipynb