H2O World San Francisco 2019 Notes

H2O World event is held in San Francisco Bay Area in this year. The community of the platform is getting bigger everyday. 1350 tech people attended the conference this year whereas 1000 attendees were present in London and 750 people joined the NYC event in 2018.

Attendees of H2O World SF ‘19

Many influencers we are already following were speaking at the conference. Mike Gualtieri, Mara Averick, Megan Risdal and Erin LeDell are some of them. We feel lucky to have listened to and met them.

Mara Averick

AI-first companies mostly adopt the “start with data scientist first” approach. H2O modifies this approach a little bit. They believe that you should “start with Kaggle grandmaster first”. This explains why H2O employs many Kaggle winner employees.

It’s clear that data enthusiastic communities adopt Python and R. These scripting languages might not be stable on production because they are developed for research-first principles. We do not have experience on what would happen if we put these systems on production, serving millions of transactions a day. Here, H2O platform offers you Python and R interface but it actually works on JVM. You don’t even feel the difference. This separates H2O from its competitors.

Interpretability was the key topic of the event. Even though the platform supports deep learning algorithms such as TensorFlow, it’s also powerful for explainable algorithms such as XGBoost or LightGBM. Also, they developed their own boosting implementations. As Agus Sudjianto from Wells Fargo mentioned, AI must be explainable in the banking world because of the heavy regulations. ML models cannot be a black boxes, they have to be transparent. Non-interpretable models cannot be moved to production. On the other hand, the interpretability and accuracy criteria of a model are inversely proportional. Deep learning has limitless power but it hits the wall if reasoning is a matter. Therefore, even if you need to use neural networks, you might prefer to use explainable neural networks (xNN).

ML Interpretability vs Accuracy

Productionizing is another key issue in machine learning. A ML model is successful only if it is deployed as Mike Gualtieri mentioned in his presentation. ML projects don’t just serve academic purposes. Moreover, the deployment is not enough. Production monitoring is a vital requirement too. Performance of ML models can decay over time even if it creates maximum business value when it is deployed. They might need to be remodeled or retrained on the production pipeline.

Beyond the open source H2O platform, the firm delivers a (paid) life saver module called Driverless AI for automatic machine learning. It bypasses most feature engineering steps for data scientists. We haven’t had a chance to use this module yet but it makes a good impression. This would reduce the effort of data scientists radically. We also know that automatic machine learning is an interest of Google, too. It seems that automatic model generation might be one of the hottest topics in this decade.

Google AutoML delivers opaque model building

So, common concerns of machine learning practitioners are mentioned in this conference such as interpretability, productionization and stability on production. Also, we can imagine the ML technology of tomorrow with these tech giants’ approaches. It seems that machine learning automation will appear much more in products. H2O World SF was really a highly enjoyable event. Hope to see you at the following events!

Editors note: This post has been prepared with the invaluable help of Sefik Serengil, whose great material can be reached here.