Originally published at http://josiahparry.com.
I recently had a conversation that touched on using to automate the parsing of Excel documents for administering data science assets. This brings up some very interesting points:
Note that this is no time to 💩 on Excel. It serves very real business purposes and unfortunately not everyone can learn to program 😕. Here’s a fun one for the h8rs: almost every presidential election campaign’s data program is based on the back of Google Sheets.
Lately I have been developing a deep curiosity of the origins of the R language. I have since read a more from the WayBack Machine than a Master’s student probably should. There are four documents that I believe to be extremely foundational and most clearly outline the original philosophies underpinning both R and its predecessor S. These are Evolution of the S Language (Chambers, 1996), A Brief History of S (Becker), Stages in the Evolution of S (Chambers, 200), and R: Past and Future History by Ross Ihaka (1998). The readings have elicited many lines of thought and potential inquiry…
The path of least resistance for Google auth is to sit back and respond
to some interactive prompts, but this won’t work for something that is
deployed to a headless machine. You have to do some advance planning to
provide your deployed product with a token.
This post will go over extracting feature (variable) importance and creating a ggplot object for it. I will draw on the simplicity of Chris Albon’s post. For steps to do the following in Python, I recommend his post.
If you’ve ever created a decision tree, you’ve probably looked at measures of feature importance. In the above flashcard, impurity refers to how many times a feature was use and lead to a misclassification. Here, we’re looking at the importance of a feature, so how much it helped in the classification or prediction of an outcome.
This example will draw on the…
Lately I have been doing more of my spatial analysis work in R with the help of the
sf package. One shapefile I was working with had some horrendously named columns, and naturally, I tried to clean them using the
clean_names() function from the
janitor package. But lo, an egregious error occurred. To this end, I officially filed my complaint as an issue. The solution presented was to simply create a method for
Yeah, methods, how tough can those be? Apparently the process isn’t at all difficult. But figuring out the process? That was difficult. This post will…
Before the United States created the Constitution, something called the Articles of Confederation defined what the US Government would look like. It was the first attempt at creating some sort of agreement between the 13 original states to form a central government. In the end, the Articles of Confederation made the new central government too weak to accomplish anything. Then, in 1787 representatives from each state met in Philadelphia to entirely scrap the Articles of Confederation in a meeting that became known as the Constitutional Convention. …
I have been living in the world of academia for nearly five years now. During this time I’ve read countless scholarly journal articles that I’ve struggled to wrap my head around. The academic language is riddled with obfuscating words like “milieux” and “nexus” which are often used to explain relatively simple concepts in a not so simple language. I’ve had to train myself to understand the academic language and translate it to regular people (layperson) speak.
The academic language is often used by the “elitist media” which has recently been blamed for creating a strong divide in American politics —…
My recent package
geniusR was created with the idea of a tidytext analysis of song lyrics in mind. I now wish to introduce you to the concepts and application of tidytext analysis through the use of
geniusR. If you would like an introduction to
geniusR please read my Introduction to geniusR. Additionally, I recommend that you give Text Mining in R: A Tidy Approach by Julia Silge and David Robinson a read.
This post was adapted from my original blog post.
geniusR enables quick and easy download of song lyrics. The intent behind the package is to be able to perform text based analyses on songs in a tidy[text] format.
This package was inspired by the release of Kendrick Lamar’s most recent album, DAMN. As most programmers do, I spent way too long to simplify a task, that being accessing song lyrics. Genius (formerly Rap Genius) is the most widely accessible platform for lyrics.
The functions in this package enable easy access of individual song lyrics, album tracklists, and lyrics to whole…
There are situations when one may have many files in a directory that they will want to have merged into one document.
This is a seemingly monotonous task, but the R language can make this pretty easy.
In order to do this, first set your working directory to the directory containing all of the files that need to be merged — note that there ought to only be the files you want to have merged within the directory .
Next store all of the file names into an object. …