or how to incorporate excel into a production API using plumber or a micro-excel-micro-service

Image for post
Image for post
Photo by Mika Baumeister on Unsplash

Originally published at http://josiahparry.com.

I recently had a conversation that touched on using to automate the parsing of Excel documents for administering data science assets. This brings up some very interesting points:

  1. Excel is sometimes unavoidable and we need to be okay with that.
  2. How can we incorporate Excel into production?

Note that this is no time to 💩 on Excel. It serves very real business purposes and unfortunately not everyone can learn to program 😕. Here’s a fun one for the h8rs: almost every presidential election campaign’s data program is based on the back of Google Sheets.

In this post I set out to explore if and how one can incorporate Excel into productionized code. Please see the GitHub repository for the code used here. …


Origins and current perspective

Lately I have been developing a deep curiosity of the origins of the R language. I have since read a more from the WayBack Machine than a Master’s student probably should. …


and {gargle} in general

This repository contains an example of an R Markdown document that uses
googlesheets4 to read from a private Google Sheet and is deployed to
RStudio Connect.

The path of least resistance for Google auth is to sit back and respond
to some interactive prompts, but this won’t work for something that is
deployed to a headless machine. You have to do some advance planning to
provide your deployed product with a token.

The gargle vignette non-interactive auth is the definitive document for how to do this. The gargle package handles auth for several packages, such as bigrquery, googledrive, gmailr, and googlesheets4. …


Extracting and plotting feature importance

This post will go over extracting feature (variable) importance and creating a ggplot object for it. I will draw on the simplicity of Chris Albon’s post. For steps to do the following in Python, I recommend his post.

Image for post
Image for post

If you’ve ever created a decision tree, you’ve probably looked at measures of feature importance. In the above flashcard, impurity refers to how many times a feature was use and lead to a misclassification. Here, we’re looking at the importance of a feature, so how much it helped in the classification or prediction of an outcome.

This example will draw on the build in data Sonar from the mlbench package. …


Making functions with methods in R.

Lately I have been doing more of my spatial analysis work in R with the help of the sf package. One shapefile I was working with had some horrendously named columns, and naturally, I tried to clean them using the clean_names() function from the janitor package. But lo, an egregious error occurred. To this end, I officially filed my complaint as an issue. The solution presented was to simply create a method for sf objects.

Yeah, methods, how tough can those be? Apparently the process isn’t at all difficult. But figuring out the process? That was difficult. This post will explain how I went about the process for converting the clean_names() function into a generic (I’ll explain this in a second), and creating a method for sf and tbl_graph objects. …


Before the United States created the Constitution, something called the Articles of Confederation defined what the US Government would look like. …


I have been living in the world of academia for nearly five years now. During this time I’ve read countless scholarly journal articles that I’ve struggled to wrap my head around. The academic language is riddled with obfuscating words like “milieux” and “nexus” which are often used to explain relatively simple concepts in a not so simple language. I’ve had to train myself to understand the academic language and translate it to regular people (layperson) speak.

The academic language is often used by the “elitist media” which has recently been blamed for creating a strong divide in American politics — as we’ve seen since the beginning of the 2016 presidential primaries. Many words, phrases, and ideas have been shrouded by this language barrier. I have been trying to break down this barrier for myself for years now. …


A f’ing fun introduction to tidytext analysis with geniusR

My recent package geniusR was created with the idea of a tidytext analysis of song lyrics in mind. I now wish to introduce you to the concepts and application of tidytext analysis through the use of geniusR. If you would like an introduction to geniusR please read my Introduction to geniusR. Additionally, I recommend that you give Text Mining in R: A Tidy Approach by Julia Silge and David Robinson a read.

Initially I wanted to perform an exploratory text analysis of Kendrick Lamar’s recent album DAMN. (2017) and compare it to his older album Section.80 (2011). During my first analysis I could not help but notice that a lot of the most common words are swear words. …


Image for post
Image for post
Contribution to Sentiment

Introducing geniusR

This post was adapted from my original blog post.

geniusR enables quick and easy download of song lyrics. The intent behind the package is to be able to perform text based analyses on songs in a tidy[text] format.

This package was inspired by the release of Kendrick Lamar’s most recent album, DAMN. As most programmers do, I spent way too long to simplify a task, that being accessing song lyrics. Genius (formerly Rap Genius) is the most widely accessible platform for lyrics.

The functions in this package enable easy access of individual song lyrics, album tracklists, and lyrics to whole albums. …


A really brief tutorial

There are situations when one may have many files in a directory that they will want to have merged into one document.

This is a seemingly monotonous task, but the R language can make this pretty easy.

In order to do this, first set your working directory to the directory containing all of the files that need to be merged — note that there ought to only be the files you want to have merged within the directory .

setwd(“Users/Directory”)

Next store all of the file names into an object. …

About

Josiah Parry

Social Scientist meets Data Scientist. josiahparry.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store