It’s Only a Munger of Time

Wranglin’ the Night Away

Nicholas Teague

Follow

Published in

Automunge

3 min readAug 31, 2018

--

In my last post I drew up some functions for wrangling structured datasets. An extension of this method could be to incorporate a function that evaluates columns in a dataframe to identify the presence of date or time series data in order to apply an appropriate processing algorithm. In this notebook we’ll create this function to automate the identification of time series data and update our automunge(.) function from the previous post to include this new category of data. This leaves us with the ability to automatically identify and process numerical, binary, categorical, and time series data from structured datasets. In the interest of brevity I won’t repost the processing functions that were introduced in a prior notebook, although they will be included in the companion Colaboratory notebook available [here].

1) Import data pre-processing functions from last notebook

(not shown for brevity)

process_numerical_class(.)
process_binary_class(.)
process_text_class(.)

2) Define process_time_class(.) function

Here we define our function to process a date or time series column once it has been identified. Note that the approach is to segregate the data into separate fields for year, month, day, hour, minute, and second. My expectation is that this could prove beneficial for cases with distinct cyclical features based on different time scales (such as day of week, season of year, business hours, etc). A reasonable extension of this method could also create an additional column capturing purely a single scale of measurement such as day, hour, or minutes aggregated over the entire process.

3) Define evalcategory(.) and automunge(.) functions

Here we update the evalcategory(.) and automunge(.) functions introduced in our last notebook to include the address of date or time series data.

4) Test Functions

Here we’ll create some sample Train and Test datasets for demonstration of our functions. Note that this is updated from our last post to now include time series data.

Our updated train data for testing the functions.

Now let’s apply our automunge(.) and see how we did.

train

labels

validation

validationlabels

output numpy array for validation labels

test

Great well I think I’ll chalk this one up as a success. Until next time.

Books that were referenced here or otherwise inspired this post:

Code Complete — Steve McConnell

(As an Amazon Associate I earn from qualifying purchases.)

*For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations. For more on AutoMunge:

Hi, I’m a blogger writing for fun. If you enjoyed or got some value from this post feel free to like, comment, or share. I can also be reached on linkedin for professional inquiries or twitter for personal.

For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations. For more on AutoMunge: