Key Takeaways from ODSC East 2018

Jonathan Kurniawan
Data Analytics @ Hult
10 min readMay 14, 2018

The Hult Data Analytics club attended the 4th annual Open Data Science Conference (ODSC) East 2018 on May 1–4 in Boston, on the theme of “The Future of AI is Here”.

Rafael Cerutti (left) and me (right) at ODSC East 2018

It was an incredible four days of great keynote speakers and technical workshops around the latest in the world of data and machine learning, with over 4500 attendees and 200 speakers.

Our key takeaway from the various speakers are:

(1) AI is moving from ‘research projects’ to ‘infrastructure at scale’. There are an increasing number of tools and middleware to make it easier to get into AI.

(2) Machine learning and AI is transforming many industries right now on their operations. Having said that, we should not be afraid of AI as there are lots for AI to work on.

(3) Deep learning is very powerful and is becoming mainstream, but it is not ‘magic’ and has limits. The analogy is of a blacksmith with only a hammer, who looks at all problems like they’re nails — there needs to be a whole suite of tools in the toolbox to solve different problems.

(4) AI governance, ethics and regulation a current concern.
- AI is not a silver bullet for systemic issues.

(5) There is an increasing number of open source data-science projects focusing on making the world a better place.

In the article below, I’ll elaborate on each of these takeaways.

(1) AI is moving from ‘research projects’ to ‘infrastructure at scale’ thanks to the different tools that are being developed.

One of the research booths at ODSC East

According to Manoj Saxena’s speech on Winning with AI, AI in its current state is where the web was back in 1999; with 2017 being the first year AI became mainstream. A couple of years ago, most of the progress in AI were research projects in contained environments, built in some researcher’s bedroom. Now, though, there is a large number of middleware and 3rd party software providers, especially open source ones, that help with various levels of machine learning.

List of famous open source data tools.

Now there is a focus on bringing data science and software development workflows, such as ‘data ops’ so the AI model could go and reach for the data, and vice versa. This is to allow enterprises to operationalize AI at scale, which reduces skills needed to manage the models that work on distributed data.

This is increasingly important, as it currently stands there is a large gap in between business ambition and execution, and the time to business value is still quite large, not to mention the large investment of putting up a sizable data science team. The trend of open innovation on these tools will increasingly help close the gap and can allow in-house data science teams to target high-value use cases and reduce time to source, train and scale data and machine learning models.

Currently most of the work done is on making faster and more accurate classifications and regressions, however, this is mostly building the components, but not assembling a general workable AI to solve all our complex needs. An analogy is like building steering wheels, gearbox, etc. but not assembling an entire car. This is important to know as we do not hype up current AI technology beyond its current capabilities, as we discuss in part 3).

To go into detail of some of the popular tools, things like Apache MXNet, or Tensorflow library by Google and its Keras API are making it easier for beginners to start with Deep Learning. There are also countless tutorials online on Plotly and Matplotlib to help one get started on using data visualization tools which would compete with Tableau. Kubernetes, Kafka, mongoDB and Hadoop are all commonplace names both within high-growth startups and enterprise companies, and Jupyter notebook has become the de-facto standard of sharing documentation with live code and explanatory text for tutorials and deliverables. This trend will only continue to grow and more open-source tools will continue to power many of the applications we use daily.

(2) Machine learning and AI is transforming many industries right now on their operations.

Most of the news we see on AI touch on futuristic sci-fi like self-driving cars and beating the best human player at Go. But what isn’t so obvious is how much data analytics and machine learning is already commonplace and used in all sorts of settings, including everyday business operations.

To have these machine learning analytics be useful, a company must first have a Big Data problem: Volume (scale of data), Variety (structured or unstructured data), Velocity (making decision from streaming data within fractions of a second) and Veracity (data uncertainty), and ultimately attain value from insights of analyzing the data.

Insurance

To take one example of an industry, we had Aleksandar Lazarevic from Aetna to touch on how they are using analytics to power decisions in insurance and healthcare, an industry known to be conservative and complex. They are able to detect fraudulent claims from providers that were excessively performing cosmetic surgeries. The approach is to first be hypotheses-driven from business to detect fraud schemes and then proceeded to using supervised models to detect variations of fraud schemes.

Another example is using social determinant signals to predict readmission, with signals like number of close friends, neighborhood they live in, economic stability, and education level. Using these signals brought up significant differences in readmission for individuals that had roughly same age and physical fitness. There would be a need of being careful of systemic bias though, as we discuss in part 4) below.

Overall, we should be careful that the right problems get solved with the right tools, making sure that there is indeed a Big Data problem, and not use AI as a blanket solution to a hard problem, just because it’s ‘the cool thing to do’.

(3) Deep learning is very powerful and is becoming mainstream, but it is not ‘magic’ and has limits.

We saw in 2016 of the high-profile matchup of AlphaGo from Google’s DeepMind project against world-class Go player Lee Seedol, where AlphaGo won 4–1 and finally putting to rest that Go was impossible for a computer to beat a human player. AlphaGo was able to do this using something called Deep Learning, where it analyzed thousands of different amateur and professional games to learn how to play Go and be performing at the top echelon of the game.

Since then, DeepMind has continued to build out a new program, AlphaGo Zero, using Reinforced Learning (RNN), which meant it was not even given any existing game data to learn the rules of the game, and instead had to figure out on its own by playing with itself. This iteration then continued to beat the winning AlphaGo model within just 40 days.

For those of you who see these developments in Deep Learning and Reinforced Learning and fear of robots taking our jobs, fear not — the hype and the power of Deep Learning has its limitations.

An example of an ‘illusion’ output that current deep learning models struggle with.

According to Gary Marcus’ keynote speech, Deep Learning requires lots and lots of data to begin to be useful, which limits its applications. Deep Learning currently also has limited capacity for Transfer Learning to other domains, something that is required for a general purpose AI. It also struggles with open-ended questions and is a black box with how it calculates its output, something that’s becoming more of an issue as we begin questioning how models arrive at an answer, especially with regulations like GDPR which might require a data scientist explain a model’s decisions. There is also the fact that we can’t trust the answer of a model fully, as shown in the image below.

Adversarial Patch that can ‘confuse’ a deep neural network model into thinking the bottom picture is a “toaster” and not a “banana”.
Attacking machine learning classification algorithms into thinking the image is a ‘safe’ or ‘loud speaker’ instead of a washer.

Having said all this, Deep Learning is a very powerful tool that is becoming more accessible to the general public. Tools such as Google’s machine learning library Tensorflow and its Keras API abstracts away complexities that allow developers to create deep neural networks in only a couple of lines. Open source infrastructure and tools, like the one posted earlier, are allowing deep learning to become more mainstream. Projects such as QuickDraw are adding datasets to fuel the data hungry Deep Learning models that Google has.

Things we can do currently vs aspirational ones that are beyond current capabilities.

To dispel the myths, Deep Learning is another way of doing statistics, namely classifying things into categories, or regression for predictive analysis, and works well when the training data is not too far away from the real-world data it will be set upon. It works great at being a perceptual classifier, but less so on natural language and ‘common sense’. We would need to continue to build other tools to make sure we can solve complex real-world problems that current machine learning models are ill-equipped to solve.

For those who still believe general AI is going to replace us in the near future, consider Apple’s billion dollar investment in Siri:

No, I DON’T want McDonalds, silly Siri.

(4) AI governance, ethics and regulation a current concern.

With the increasing speed of development of AI and other tech innovations such as Blockchain, IoT, etc. we begin to find our society being forced to adapt and keep up with all these changes. Visibility, explainability and governance become a key concern to progressing AI without adverse effects to the society.

Automation bias is an over-reliance on automated aids and decision support systems. The keynote speaker Cathy O’Neil addressed this head-on with the question of whether algorithms perpetuate human bias, giving examples such as hiring algorithms that look at historically good hires, which in organizations such as Fox News include mostly white men. Moreover, if those hires tend to become problematic, like in Fox News case of sexual harassment allegations from its founder and some of its TV hosts, would AI propagate those attributes for new hires?

One of the speakers Stephanie Kim also touches on how facial recognition systems can be biased based on race, especially due to the lack of images of minorities in the data set. Look no further to Google’s problem of labeling an African-American as a gorilla. Objectives of AI algorithms need to be properly analyzed to make sure that we are building tools that have a positive impact.

Governance of rogue AI will also be another key challenge. We need to have conversations of how to make sure rogue AI does not go into the wild — for instance, what happens if a testbot figures out its best method of ‘testing resiliency’ in a software is to take down an entire data center? These conversations are starting to prop up and should continue to be a concern as ethics and compliance becomes a major concern for AI.

To learn more about this, my team and I have done a consulting presentation for World Economic Forum as part of our MBA project on automation bias in the justice system in the United States. The lesson is not to use AI as a silver bullet to combat systemic issues and hope for an ‘objective’ solution because a computer said so, but to combat the systemic issues with dialog of different stakeholders and ensure that AI will not aggravate the issue.

(5) There is an increasing number of open source data-science projects focusing on making the world a better place.

There are reasons to be optimistic though, as many of the speakers in the community were sharing various projects that are currently focusing on positive impact. Speakers like Trevor Grant, open source AI evangelist at IBM, and Eric Scheles, researcher who combats trafficking with machine learning, all illustrated that there are worthwhile projects that are currently available where we all can contribute to and make the world a better place.

Some of the data scientists using data science for good.

Overall, ODSC was a great event filled with insightful and thought-provoking speakers from all over the data field. To know more about their upcoming events and conferences, visit odsc.com.

Hult is an international business school focused on the global mindset, with campuses all over the world, from San Francisco, Boston, New York, London, Dubai, and Shanghai.

Hult offers a number of post-graduate business degrees focused on the changing landscape of the job market, which include a one-year MBA, Masters of Business Analytics and Masters of Disruptive Innovation. To learn more, visit http://www.hult.edu/en/masters-degree/

--

--