Automunge
In which we keep on wranglin’
In my last post I drew up some functions for wrangling structured datasets. An extension of this method could be to incorporate a function that evaluates each column in a data frame to determine which of the three defined processing functions would be most appropriate, potentially opening the door to automated data preprocessing, which could prove beneficial for big data scale training sets. In this notebook we’ll create this function to automate the column selection process for wrangling a structured dataset. There’s a companion Colaboratory notebook available [here]. Cheers.
1) Import data pre-processing functions from last notebook
Here again are the functions we defined in our last post:
2) Define evalcategory(.) and automunge(.) functions
- evalcategory(.) — looks at column in a dataframe to determine which preprocessing function to apply
- automunge(.) — takes as input dataframes of train set, test set, id of label column from train set, and validation ratio and then outputs a set of numpy arrays ready for application of machine learning algorithms in framework of your choice.
Note that this approach assumes that the test data is available at time of wrangling. A reasonable extension here would be to allow the function to output variables such as for normalization for subsequent processing of test data if processing of test data is not simultaneous, the output of such values could be triggered by incorporating an additional True/False selection in the defined function arguments.
3) Test Functions
Here we’ll create some sample Train and Test datasets for demonstration of our functions.
Now let’s apply our automunge and see how we did.
Here we’ll view the output numpy arrays:
train
labels
validation
validationlabels
test
Great well I think I’ll chalk this one up as a success. Until next time.
Books that were referenced here or otherwise inspired this post:
Seeking Wisdom— Peter Bevelin
(As an Amazon Associate I earn from qualifying purchases.)
*For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations. For more on AutoMunge:
Hi, I’m a blogger writing for fun. If you enjoyed or got some value from this post feel free to like, comment, or share. I can also be reached on linkedin for professional inquiries or twitter for personal.
For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations. For more on AutoMunge: