It doesn’t always have to be this complicated

Using Pandas and Excel Interactively

TKP
Analytics Vidhya
Published in
3 min readJan 4, 2020

--

Summary

This is a short tutorial on how you can get the best of both worlds using the xlwings library to make moving between Python/Pandas and Excel one line of code — without the hassle of exporting files from one to the other every time. (For those who want to see a more detailed example scroll to the bottom for the GIST with a full example using code).

The example below literally took one line of code in Python and a couple minutes to build the PivotTable and PivotChart in Excel.

Left: Pandas output; right: Excel PivotTable using xlwings.

Installation

Installing xlwings is pretty straightforward (https://docs.xlwings.org/en/stable/installation.html) via PIP or Anaconda. That’s it!

Background

I’ve been learning (and documenting) my progress on learning Pandas. Moving from Excel Pandas provides a lot more flexibility when it comes to the available libraries, dealing with big data, etc., however, when comes down to UI, it can be frustrating to use Juypter to look at all the data…especially when you’re data wrangling.

As I mentioned in my Asynchio tutorial (https://medium.com/@tkpca/summary-74789e47d420), I’ve been working on a pet project to pull and analyze NBA data. While I could have pulled some cleaner data from a Kaggle dataset in real-life data is messy, and, learning the end-to-end workflow is important to me, but creates two problems:

1) I’m going to make mistakes wrangling all of this data, especially because there isn’t formal documentation on the NBA API; and

2) Unfortunately, the NBA API isn’t perfect (mainly for old data). For example, I found that for some of the box scores prior 1985, offensive and defensive rebounds were not logged correctly. Look at the data for Larry Smith here, https://stats.nba.com/game/0028400895/.

Getting to the Point

Now you could take your Dataframe, push it as an Excel file, open/analyze/repeat, however, in many cases that’s overkill, and, in my opinion, too many clicks.

Pandas + Excel Filters, Sorting and everything else!

Here are the basic steps

1) Make sure you’ve imported the xlwings library;

Set up an instance of Excel. Generically I use the line “app = xw.App()” to load an instance of Excel (note “xw” is what imported xlwings as).

2) Make sure you have the Dataframe you want to push in memory.

3) Push your Dataframe to a cell in the blank Excel workbook.

xw.Range(‘A1’).value=df.”.

That’s it, you’re done!

Keep in mind if you have an extremely large Dataframe this will be slow (and probably not the best idea), but it’s great when you want iterate over pieces of a larger Dataframe or quickly create Excel charts. The library can do a lot more including replacing VBA with Python, but that’s a another article for another day.

https://github.com/tkpca/Python-Scripts/tree/master/xlwings

--

--

TKP
Analytics Vidhya

Interested in learning ways to use data to tell stories.