Reading and writingExcel files in Python pandas
In data science, you are very likely to mostly work with CSV files. However, knowing how to import and export Excel files is also very useful.
In this post, a Kaggle dataset on 2016 US Elections was used (https://www.kaggle.com/benhamner/d/benhamner/2016-us-election/primary-results-sample-data/output). This dataset has been converted from a CSV file to an Excel file and two sheets have been added with votes for Hilary Clinton (HilaryClinton) and Donald Trump (DonaldTrump). The first sheet (All) contains the original dataset.
Reading Excel files
dframe = pd.read_excel(“file_name.xlsx”)
Reading Excel files is very similar to reading CSV files. By default, the first sheet of the Excel file is read.
dframe = pd.read_excel(“file_name.xlsx”, sheetname=”Sheet_name”)
Passing the sheetname method allows you to read the sheet of the Excel file that you want. It is very handy if you know its name.
dframe = pd.read_excel(“file_name.xlsx”, sheetname=number)
If you aren’t sure what are the names of your sheets, you can pick them by their order. Please note that the sheets start from 0 (similar to indices in pandas), not from 1.
dframe = pd.read_excel(“file_name.xlsx”, header=None)
Sometimes, the top row does not contain the column names. In this case, you pass the argument of header=None.
dframe = pd.read_excel(“file_name.xlsx”, header=n)
Passing the argument of header being equal to a number allows us to pick a specific row as the column names.
dframe = pd.read_excel(“file_name.xlsx”, index_col=number)
You can use different columns for the row labels by passing the index_col argument as number.
dframe = pd.read_excel(“file_name.xlsx”, skiprows=n)
Sometimes, you don’t want to include all of the rows. If you want to skip the first n rows, just pass the argument of skiprows=n.
Writing an Excel file
dframe.to_excel(‘file_name.xlsx’)
dframe.to_excel(‘file_name.xlsx’, index=False)
If you don’t want to include the index name (for example, here it is a number so it may be meaningless for future use/analysis), you can just pass another argument, setting index as False.
All of the code can be found on my GitHub: https://github.com/kasiarachuta/Blog/blob/master/Reading%20and%20writing%20Excel%20files.ipynb