AutoAI — A journey, not a destination

Mihai Sarto
IBM Data Science in Practice
5 min readDec 8, 2020

The journey from data to insight and prescriptive action is complex. It often takes multiple people and steps to understand and learn the data, train and validate the models, and predict future values. In this text, I will show you why using automatic machine learning tools like AutoAI can help expedite and enhance this process overall.

When you have data, you’d probably like to determine a formula between data elements to give a good prediction for a certain property — and, furthermore, use this formula with the new data to predict future values. This sounds like the job of machine learning — training a model that is going to learn about your data behavior, validate its accuracy, see how good it is, and then use it to predict future data.

In order to do this, these two key roles are typically involved:

  • Data Scientist (DS)– this role does the complicated math and comes up with magic formulas (a machine learning algorithm) out of your data features — typically giving a degree of accuracy for the value you want to predict.
  • Business Analyst (BA)– this role works closely with the data scientist to explain the data features and business terms, identify the importance of each property, and provide feedback if the learning is good enough.

These two work together to create models, lambdas, and formulas using various techniques most comfortable to the individuals, validating predictions to see how accurate they are and if they are good enough to be used for future predictions. Machine learning tools allow users to choose what type of models to create, what parameters to tune for each type of algorithm, what data to use, what data to not use and how to measure the accuracy of a model.

These are often complex and quite scientific applications. At the end of the day, a model is produced and, if it is accurate enough, it will be used for future predictions. The success of the prediction, measured by its accuracy, also lies in how well the business analyst identifies feature importance and the quality of the algorithm selected by the data scientist for the problem.

Keep in mind, there are many hurdles that can slow down the process. For example, manually selecting the ML algorithm to test the data typically requires writing lines of code to get a baseline performance (especially with complex data sets). The accuracy from the model can be much lower than expected; adding hyper-parameter optimization after the initial run may not help enhance the accuracy; and additional tests of feature engineering may only show the features are not useful. In this circumstance, the DS and BA (data scientist and business analyst) team will have to start all over again with other algorithms and/or other parameters for the models to seek better accuracy.

Automate model building and training with AutoAI

What if there’s a way to automatically walk through these painful steps to provide a baseline performance you can easily built upon? What if you could reduce this extensive process of selecting from a wide range of ML models parameters by combining features automatically to find the perfect blend of features to address your problem?

The AutoAI function of the IBM Watson Machine Learning product does just that.

Imagine: you load your raw data onto the application and automatically the work of the data scientist and business analyst begins. Using AutoAI, the tool builds thousands of model types — with different algorithms, performing feature engineering and validating the accuracy for each model in the quest to achieve better accuracy — doing so with the automation of IBM cloud.

To quickly test in the beginning, AutoAI can analyze the data and build simple models. Linear regressions, Decision trees, Gradient boosting, Logistic regressions, Boosting trees and other ML algorithms are all attempted, tested and validated in order to determine what fits best for the data. Based on this and with in-depth analysis, more pipelines of data transformation and estimators then evolve in time. New features are being derived from existing features without reducing quality. With automatically generated code, one can easily copy the output to their notebook and seamlessly relocate to dive deeper. It can also be inspected or used as it is.

The journey of discovering better models, however, is not always quick, and is most definitely complex. The complexity of an AutoAI experiment is measured by the complexity of the data, the number of predictors that are needed to be analyzed, and the number of distinct classes inside of the data. These factors produce a high number of possible combinations to be attempted by AutoAI. And this is normally seen on the longer runtime executions to discover models.

Remember the painful manual steps of building and training models above?

AutoAI will keep the user informed about the latest models, the stages of the data engineering and how accurate prediction can be. At the various stages it gives the user the opportunity to leverage the models and their properties. This process, therefore, elevates the journey as more important that the destination itself. In the sense there is no final model at the end of the execution, but rather discovered models along the process. This result gives the user assurance that they have not overlooked any models, and ensures a great start. Finishing the analysis of data and considering all possible feature combinations for models is no longer the main goal. The model itself is not produced at the end of the process, but during the process of tuning and evolving.

AutoAI takes complexity out of data science work for both data scientist and business analyst through advanced sharing to produce more accurate results.

If you are a data scientist, you can use the IBM AutoAI feature to build better models and, in the process, discover which models are most suitable for your data.

If you are a business analyst, the IBM AutoAI feature can help you discover features while building models, and the understand impact of each uncovered feature.

Try out the AutoAI feature on IBM Watson Machine Learning: https://developer.ibm.com/technologies/artificial-intelligence/series/explore-autoai/

--

--