Originally published at http://josiahparry.com.
I recently had a conversation that touched on using to automate the parsing of Excel documents for administering data science assets. This brings up some very interesting points:
Note that this is no time to 💩 on Excel. It serves very real business purposes and unfortunately not everyone can learn to program 😕. Here’s a fun one for the h8rs: almost every presidential election campaign’s data program is based on the back of Google Sheets.
In this post I set out to explore if and how one can incorporate Excel into productionized code. Please see the GitHub repository for the code used here. …
The path of least resistance for Google auth is to sit back and respond
to some interactive prompts, but this won’t work for something that is
deployed to a headless machine. You have to do some advance planning to
provide your deployed product with a token.
The gargle vignette non-interactive auth is the definitive document for how to do this. The gargle package handles auth for several packages, such as bigrquery, googledrive, gmailr, and googlesheets4. …
This post will go over extracting feature (variable) importance and creating a ggplot object for it. I will draw on the simplicity of Chris Albon’s post. For steps to do the following in Python, I recommend his post.
If you’ve ever created a decision tree, you’ve probably looked at measures of feature importance. In the above flashcard, impurity refers to how many times a feature was use and lead to a misclassification. Here, we’re looking at the importance of a feature, so how much it helped in the classification or prediction of an outcome.
This example will draw on the build in data
Sonar from the
mlbench package. …
Lately I have been doing more of my spatial analysis work in R with the help of the
sf package. One shapefile I was working with had some horrendously named columns, and naturally, I tried to clean them using the
clean_names() function from the
janitor package. But lo, an egregious error occurred. To this end, I officially filed my complaint as an issue. The solution presented was to simply create a method for
Yeah, methods, how tough can those be? Apparently the process isn’t at all difficult. But figuring out the process? That was difficult. This post will explain how I went about the process for converting the
clean_names() function into a generic (I’ll explain this in a second), and creating a method for
tbl_graph objects. …
I have been living in the world of academia for nearly five years now. During this time I’ve read countless scholarly journal articles that I’ve struggled to wrap my head around. The academic language is riddled with obfuscating words like “milieux” and “nexus” which are often used to explain relatively simple concepts in a not so simple language. I’ve had to train myself to understand the academic language and translate it to regular people (layperson) speak.
The academic language is often used by the “elitist media” which has recently been blamed for creating a strong divide in American politics — as we’ve seen since the beginning of the 2016 presidential primaries. Many words, phrases, and ideas have been shrouded by this language barrier. I have been trying to break down this barrier for myself for years now. …
My recent package
geniusR was created with the idea of a tidytext analysis of song lyrics in mind. I now wish to introduce you to the concepts and application of tidytext analysis through the use of
geniusR. If you would like an introduction to
geniusR please read my Introduction to geniusR. Additionally, I recommend that you give Text Mining in R: A Tidy Approach by Julia Silge and David Robinson a read.
Initially I wanted to perform an exploratory text analysis of Kendrick Lamar’s recent album DAMN. (2017) and compare it to his older album Section.80 (2011). During my first analysis I could not help but notice that a lot of the most common words are swear words. …
This post was adapted from my original blog post.
geniusR enables quick and easy download of song lyrics. The intent behind the package is to be able to perform text based analyses on songs in a tidy[text] format.
This package was inspired by the release of Kendrick Lamar’s most recent album, DAMN. As most programmers do, I spent way too long to simplify a task, that being accessing song lyrics. Genius (formerly Rap Genius) is the most widely accessible platform for lyrics.
The functions in this package enable easy access of individual song lyrics, album tracklists, and lyrics to whole albums. …
There are situations when one may have many files in a directory that they will want to have merged into one document.
This is a seemingly monotonous task, but the R language can make this pretty easy.
In order to do this, first set your working directory to the directory containing all of the files that need to be merged — note that there ought to only be the files you want to have merged within the directory .
Next store all of the file names into an object. …