My Python journey, Pt. 3: Seaborn, NLTK, and more
This last month has been challenging in the Python world. My mentor and I spent a lot of time cleaning up my Enron emails dataset and getting it correctly entered into my Postgres database. I’m learning how intuitive Python can be, but also how critical familiarity with modules is if you want to accomplish anything.
My Python code grew from one file to five over the last couple of weeks. To a developer, this is probably not a big deal, but I’ve never created a project of my own with multiple files before. Each of my ORM classes has its own file, which means I import them in the other files that reference them.
And here is my cleaned up, functional code for reading the Enron email files and writing them to Postgres:
I’m now in the data exploration phase, where I have two primary goals. The first is to familiarize myself with data visualization in Python. The second is to use NLTK to try out natural language processing.
For the data visualization, I selected Seaborn, a Python module built on top of matplotlib to more easily produce pretty graphs. I also used Pandas to easily read data in from Postgres and store it in a dataframe. Still a work in progress, but here is the graph I made of the most frequent contacts by Kenneth Lay, former CEO of Enron:
Here is the visualization code. I want to do a lot more with this, but I try to take advantage of my time with Jay to get a minimum viable product working, and then switch topics.
With NLKT, I want to start with sentiment analysis, and there’s a cool module called VADER that does a lot of the work for you. Ideally, I’d like to create a visualization of the relationships between the different Enron employees that shows which links were strongest (most emails exchanged) and what the tone of those emails was (positive vs. negative sentiment). For my final product, I’m envisioning a folder of analyses and visualizations that tell the Enron story through the emails.
As always, thanks to Jay for the time he spends every week teaching me everything from Pycharm debugging tips to the what in the world heaps do.