Automunge
Published in

Automunge

Automated Data Wrangling with Machine Learning Derived Infill for Missing Values

The shiny new thing

Preservation Hall Jazz Band — Just A Closer Walk With Thee, Part I

1) preprocess and evalcategory functions from prior notebooks

Preservation Hall Jazz Band — Just A Closer Walk With Thee, Part II

2) Define new functions

  • NArows(.) — used to identify which rows for a column to infill with ML derived plug data
  • createMLinfillsets(.) —returns a series of dataframes which can be applied to training a machine learning model to predict appropriate infill values for those points that had missing values from the original sets
  • labelbinarizercorrect(.) — ensures that the re-encoding following application of scikit’s LabelBinarizer is consistent with the original array
  • predictinfill(.) — returns predicted infills for the train and test feature sets’ missing values based on application of machine learning against other points in the set
  • insertinfill(.) — replaces the missing values in rows corresponding to the NArows with the values from predictinfill
Sweet Emma and Her Preservation Hall Jazz Band — Closer Walk With Thee

3) Update automunge(.) function

4) Test Functions

train
train array output from automunge function with MLinfill changes highlighted in red
train array output from automunge function with NA cells using standard infill techniques highlighted in red
titanic data train set first five rows before any processing
train
titanic set train array output from automunge function
Bobby McFerrin — Don’t Worry Be Happy

So Long and Thanks For All the Fish — Douglas Adams

So Long and Thanks For All the Fish

--

--

Automunge —Prepare your data for Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nicholas Teague

Writing for fun and because it helps me organize my thoughts. I also write software to prepare data for machine learning at automunge.com. Consistently unique.