AutoAI for Data Scientists: From Beginner to Expert

Jacques Roy
Oct 11, 2019 · 6 min read
Image for post
Image for post

Data science is a required practice for organizations accelerating their journeys to AI. Businesses are keen on hiring the right talent, acquiring the right tools and evolving the discipline. When it comes to data science projects there are two major problems:

1) There are not enough data scientists.

2) It takes too much time for any data scientist to get to a usable, tuned model.

Solving the lack of data scientists' problems requires investment in our employees in terms of time and training. We can’t expect these people to just keep on learning for a year before they can be productive. We need to reach a stage where people know enough to start contributing immediately while continuing to improve their skills.

As far as the second problem is concerned, taking too much time getting to a usable and tuned model, we need tools to help us optimize our data scientists' productivity. There are some tasks that are relatively mundane that could be automated, leaving the more challenging and interesting parts to the data scientist.

Intelligent automation in data science and AI empowers everyone

Image for post
Image for post
AIconic Award for AutoAI

Currently, AutoAI addresses problems related to classification and prediction (regression). These types of problems are at the core of many data science initiatives. If you are an experienced data scientist, you know how to solve them. With AutoAI in Watson Studio, you can quickly see the leaderboard of the various pipelines which help accelerate the model selection. If you are learning data science you can learn how these functions are used.

Image for post
Image for post
AutoAI processing and leaderboard

At the highest level, creating a model involves taking some data, passing it through a machine learning algorithm, and getting a resulting model. Well, it’s not always that simple.

Let’s say you have your data as a comma-delimited file (.csv). To start with, all the attributes are character strings. We need to identify all the fields that are numeric and convert them into integer, decimal or floating-point numbers. You also have to consider dealing with missing values and normalization.

The character fields also have to be converted to numeric values. Typically, we are talking about categorization. For example, gender, type of payment, and so on.

We must admit that this is not the most exciting part of creating a model. Being able to automate this part makes expert data scientists more efficient and helps more junior data scientists avoid mistakes while address the pre-processing of the data even if they are still learning about what needs to be done.

See for yourselves: You can try the AutoAI tutorial on the IBM Cloud for free.

Which algorithm to use?

Image for post
Image for post
Curated models available through AutoAI

We also have to contend with feature engineering and hyper-parameter tuning. Which new features should you create? Based on what? This takes experience to select the right mix. As for hyper-parameter tuning, this can be tricky. You could end up with a model that works great on training data but not so much on new data. You could also end up with a less than optimal model.

AutoAI addresses all those issues and allows you to make an educated decision on which model performs best. Your decision is assisted with evaluation measures such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and others on both the training and testing data (including cross-validation). You can even see the details of how feature engineering was done and the feature importance. This is especially a key part for a beginner to start learning about data science. For expert data scientists, you can validate or adjust some of your assumptions here.

Image for post
Image for post
Model evaluation

Once you decide on the model to use, you can save and deploy it into an IBM Watson Machine Learning service so people can score their data through a simple REST API.

Image for post
Image for post
Saving an AutoAI model

A Perfect Blend of Open-source and IBM Technology

Ah, this is a proprietary solution! Not at all!

Instead of saving the model to an IBM Watson Machine Learning service, you can save it as a notebook. This way, you can generate the model yourself and decide where to save and deploy it. Since it is a notebook, you can modify it for any reason, may that be adding some transformation or make it fit datasets with additional attributes. And of course, you can use this with an open-source or Watson Studio based tool.

Image for post
Image for post
Generated notebook

One side benefit of generating a notebook could be for education and training. It is always instructive to see how things are done, and beginner data scientists may see some transformations they did not think about for this or other projects. This becomes learning by example. IBM is committed to leading and empowering the open-source community and data science is, of course, no exception!

Giving you more time to innovate by minimizing mundane or repetitive tasks

Since AutoAI can select the more appropriate model for classification or regression, automate feature engineering and hyper-parameters tuning, and provide measurements on the quality of models, data scientists can focus on the evaluation and selection of the model instead of the mechanics of creating one.

Overall, AutoAI democratizes data science and AI — data preparation, model development and selection, execution and deployment. This addresses the shortage of data scientists and gets to a solution faster. By accelerating the data science lifecycle with AutoAI, businesses can focus more on high value-added work and innovative solutions. This is why we are focused on sharing the best practices and playbook in AI. The Future of Work Webinar in data science will be more exciting and dynamic I predict.

Ready to learn more about AutoAI?

Jacques Roy is a worldwide Digital Technical Engagement lead on Watson Studio and Watson Machine Learning from IBM where he helped build a community of data scientist followers who are brushing up their skills at all levels. He loves to talk about data science, use cases, and best practice tips

Please reach out to Jacques for any questions or comments!

IBM Watson

AI Platform for the Enterprise

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store