Unlocking your data’s potential with IBM Watson Studio’s AutoAI feature engineering on relational data
If you’ve ever spoken to a data scientist / machine learning practitioner (or are one yourself) at one point or another, you have probably had a conversation that sounded eerily similar to this:
“So, what do you do?”
“I’m a data scientist.”
“That sounds fun…What does that mean?”
“I apply rigorous mathematical and statistical analyses to make predictions learned from past data on new observations!”
“Cool! So, what’s a typical day like?”
“Well… I spend most of my time searching for, cleaning, and processing data files..”
“Oh…”
At IBM Watson® Machine Learning and IBM Research, we wanted to help you change that script and get back to the real work, which is why we are proud to announce the general availability release of our new enhancement for AutoAI: feature engineering on relational data. As part of our continuing effort to lower the barriers to adoption of state-of-the-art machine learning tools across organizations, the release of this feature builds off of the success of our award winning AutoAI offering and allows users to instantly combine and extract new features from multiple data files, all with the click of a button. AutoAI is already one of Watson Studio’s premiere offerings that automates the arduous work of data ingestion and preprocessing, feature engineering, model building, validation, and code generation. Relational data feature engineering is a new enhancement to this product that helps take your analyses to new heights.
Long gone are the days of siloed data, where disparate parts of an organization each maintained their own information, with no need to share or combine for analysis. In today’s modern business settings, the full benefit of data science technologies cannot be realized without the ability to combine, process, and extract new information from data housed across an organization. AutoAI’s new feature engineering on relational data directly addresses this use case and allows data scientists to get back to what they really want to be doing: building and analyzing high accuracy models.
Seeing is believing
If all of this sounds too good to be true, allow us to take a moment to walk through an example that highlights the power of AutoAI’s feature engineering on relational data. Even for those readers who are familiar with AutoAI, we highly recommend reading on to see all the added benefits of this new feature.
Let’s start with some sample data. Imagine you’re a data scientist working at a top outdoor equipment retail chain, “Great Outdoors” (GO), and you want to better predict sell-through data. Naturally, you want to rely not just on the tables that contain sales quantities and date, but you also want to better understand which product features, retail locations, and channels most affect your data. This information lives in different locations across your organization, so what are you to do? Well, to really get into the role, read more details about this sample use case, find the sample datasets in the AutoAI experiment gallery, and let’s see where this data journey takes us:
We begin by adding the data assets to our AutoAI experiment. Once multiple files have been added, relational data feature engineering kicks into gear and users are prompted to walk through a friendly user interface that enables seamless data joining.
Taking the place of complex relational database joins and preprocessing, this friendly user interface allows you to easily combine data, with helpful features like suggested join keys.
Once our data is ready, we can further customize the experiment. For example, by navigating to the experiment settings, we see an option to designate certain columns as timestamps, and the ability to set a sliding window, which controls how long in the past or future to look when joining two time-dependent datasets.
With our settings complete, we hand the experiment over to AutoAI to sprinkle some machine learning magic on top of it.
The first step in the process after the data has been ingested is data joining and “join feature extraction”, which means that AutoAI is not just combining our data assets according to the flow we designated earlier, but it is also creating new features for us as it does so. By analyzing correlations with our prediction variable, looking for redundancies, and searching through a space of possible new features, AutoAI produces a brand new aggregate dataset on which the machine learning algorithms will run.
Once relational data feature engineering is complete, the core AutoAI process takes over with a cutting-edge pipeline generation process that involves optimal data allocation to different algorithms, additional feature engineering, and hyperparameter optimization.
And just like that, with a few clicks through and intuitive user interface, we’ve built a highly accurate model on several datasets.
As part of Watson Studio and Cloud Pak for Data™, the generated AutoAI pipelines have access to the full suite of data science lifecycle solutions offered on these platforms. For example, once the desired pipeline is selected, it can be saved and deployed for scoring.
By automating the difficult steps that often serve as a barrier to machine learning adoption, AutoAI’s feature engineering on relational data further makes Watson Studio the premiere destination for data scientists and organizations looking to apply the power of AI to their data.
Get started with AutoAI and relational data feature engineering today by visiting Watson Studio.
Happy modeling!