Machine Learning and Governance: An Interview with Seth Dobrin

Published in

Inside Machine learning

5 min readJun 5, 2017

SEM image of a Peacock wing, Wikimedia Commons

I recently had a chance to visit with Seth Dobrin, the new Vice President and Chief Data Officer for IBM Analytics.

You come to IBM Analytics from Monsanto’s digital strategy team where you earned a reputation for leading exponential change. You’ve been at IBM just a few months so you’re looking at things with fresh eyes. Where do you see the opportunities to change or improve what we offer to private cloud customers?

Well, I’ll first say that we’re already doing a lot of things right, and it’s clear that our customers trust us and we make some of the best products on the market — just ask the analysts. We’ve established a reputation for working closely with customers and getting them to the outcomes they want, rather than urging them to adopt the next new thing. And I came to IBM in part because of this and in part because every time I have engaged with IBM, I have been impressed with how incredibly smart everyone is.

IBM recognizes that most enterprises do not have the luxury of starting with a clean slate. Enterprises have existing data and applications that run their businesses today. For better or worse, these platforms live in data centers that were built over the last several decades. How we show up to help them connect these systems to the benefit of cloud will define our future, and theirs. The journey to the cloud — and it’s a journey you can’t just do in a week, a month, or even a year — is not one-size-fits-all, and enterprises appreciate that IBM understands this perspective.

I’ll just also point out that those existing systems and applications — which some might dismiss as “legacy” — are still at the heart of most of these businesses. These are business-critical systems that run existing enterprises. These are the systems that are paying the bills today. IBM is good at recognizing that a big part of our job is giving those applications and systems more oxygen, not less, and connecting these assets to future strategies whether that’s public cloud, private cloud, or hybrid.

In terms of opportunities, the biggest one we have is around governance. Governance is typically seen as a roadblock or at least a speed bump. We need to help enterprise reframe governance as an enabler or even an accelerant. The way to do this is to build an active and integrated unified governance platform built on an open source metadata backbone, Apache Atlas™. Atlas operates across environments, including business-critical assets, private cloud, and public cloud.

Interesting. You mentioned metadata. What are you thinking of?

Metadata is about maintaining contextual awareness around your data. To do this you need three things: access, provenance, and intent. You need to be able to see at a glance who came into contact with the data, what they did, whether they moved it, where they moved it, and why.

For years you’ve been adapting big data infrastructure and analytics for specific use with genomics and agriculture. Looking at our private cloud customers, how do we help them balance a common set of tools and practices against the need to craft these kinds of industry-specific solutions?

Almost every modern organization has certain things in common that they need to do around data — and that we can help them address. And when it comes to data and what CDO’s and CDO-types care about, it revolves around 5 key areas: creating enterprise data assets, creating deployable data science assets, unified governance, developing a cloud strategy and building a talent pipeline. IBM brings some unique capabilities to the table around these five areas ranging from great traditional hardware and software to cutting-edge ecosystems and software-as-a-service like our leading analytics platforms built on Apache Spark™.

But you’re right that different sectors organize and mine data in very different ways. At Monsanto, we had to create pipelines for genomics data that ran huge cross-comparisons of gene variations and environmental conditions. That’s a different kind of data challenge than you’d see in retail or finance. As more sectors of the economy start to leverage their proprietary data, I think we’ll see more diversity in what they need to build in order to take advantage of that data, but I predict the basic model will be hybrid cloud because they’ll see that they need to be protective of proprietary data, while creating a secure outer edge that faces the world. Think of mining, construction, water management, and food production. We might not associate those sectors with big data but they’re already simulating strategies and spending in silico, because they see big opportunities for automation and optimization.

I think only IBM is going to be able to meet them where they are and help them see how data shapes their futures. And that gets you to machine learning, of course. That gets you to cognitive.

You’ve talked and written about the need to shape not only strategy but the corporate culture that’s needed to support those strategies. What approaches have worked for you?

I’m very much aligned with Rob Thomas — and Ginni — on this. In everything we do we should look from the customers perspective. If it doesn’t improve our customers experience we should think twice about why we are investing in it. Customers don’t want products they want solutions to problems that are easy to get into and even out of. It is our job to enable this and our commitment to open source technologies is the framework for this. Analytics with Apache Spark and SystemML. Block chain with HyperLedger. Metadata with Apache Atlas and Ranger.

I also think we can continue to encourage the team to propose and pilot new business models and new methods of engagement internally and with our customers. So we have to always be asking how we can welcome and then support those proposals.

Maybe a related question: You’ve spent some time volunteering as a youth soccer coach. Any insights about operating in the midst of chaos?

Ha! Well, the thing that strikes me as a coach is actually how eager and devoted the kids are. if you can get them to stop trying to do everything themselves and work as a team they out-perform other teams. This translates well to the workplace as most problems worth tackling cannot be addressed with a single individual or team. They are generally better addressed by a team or team of teams. If you get individuals to stop trying to be heroes then winning is easy.

I invite you to learn more about how IBM is helping customers move forward with data governance and machine learning.

Machine Learning and Governance: An Interview with Seth Dobrin

Written by Dinesh Nirmal