read_csv() vs read_excel() in pandas: When to use which and why

Ashwin A. Vardhan
2 min readApr 17, 2019

--

Have you ever wondered if that excel file that you have, can be made to read faster instead of sitting idle for 10 minutes while your code scans through it? I’ll try to answer it here with a personal experience of mine.

So, I was performing some operations on an excel file using pandas, whose dimensions were 509579 x 240, and it had a size of 295 MB. However, reading that file took about 528 seconds (average over 10 iterations), whereas, on converting it to csv and reading it (using pandas) took just 13 seconds (again, average over 10 iterations), which is an improvement of about 40 times. As you can observe, the file was too huge, and read_excel is just slower in performance. This has been mentioned on stackoverflow too.

So, when will you actually need to convert an excel file to csv before processing it?
Usually, if your excel file is small (~100,000 rows, ~50 columns), there won’t be much of a performance issue, unless, you need to run that process very frequently, because, that small delay may get compunded and bite you!
But, if your excel file is massive, and you need to process it frequently (or not), it’s better to first convert the excel into a csv, and voila! see the magic happen.

Edit 1: Adding the screenshot with the file names a bit redacted:

Pandas reading time comparison for the same file but indifferent format

--

--