Love what you do

Madhu Kochar
Inside Machine learning
6 min readApr 9, 2019

Madhu Kochar and Jay Limburn

I was chatting to a friend recently who is an air traffic controller. He explained that since childhood he has been fascinated by all things aviation. Not just the thrill of flight, but the mechanics behind the incredible machines that make the world such a small place. He loves his job because he gets to spend all day working directly with these machines and using his knowledge of them to directly affect the path each one takes, being able to understand individual aircraft performance, climb rates, air speed, weights and suitable airports as well as having a direct line to the captains. However, with the enjoyment his chosen profession brings comes a degree of mundaneness.

A key part of the job is to ensure that the data being received from adjoining sectors of airspace is accurate and clear. As a flight enters his sector he has to check and recheck that the data received on that flight is accurate and has his trust, before he can even start to focus on the fun part of how to direct that flight. This involves a huge amount of knowledge about the procedures that need to be followed and constant recertification to the latest regulations and processes. This part can be tedious, mundane yet essential to ensure safe passage of aircraft through his airspace. As air traffic increases and the number of available data points increase this task is ever growing, increasingly Air Traffic Controllers will need to turn to intelligent data management techniques to ease this burden to allow air traffic controllers to spend more time orchestrating their city in the sky.

The same is true of data scientists — The “2017 Data Scientist Report”¹ from CrowdFlower showed that 64% of Data scientists responding agree that they are working in the 21st century’s sexiest job. This is validation that Data Scientists love what they do. They love the key aspects of delivering value from their data, but like my air traffic controller friend a large proportion of their work requires them to spend time on tasks that are not so much fun. Tasks that if they could minimize or eradicate altogether might give them more time to spend on the things that they love doing.

Being the Janitor

So, what is it that data scientists love and hate? Data scientists I have interacted with seem to have gotten into data science to do cool things with data and math. They love going deep into data and identifying non obvious patterns that they could use. They love building and tailoring the best algorithm to deliver the most efficient model to answer questions, identify patterns in images and ultimately prove they are the top of the data science pyramid by delivering something far cooler and deriving more value than their peer sat across the table. From the same report, the three tasks data scientists reported enjoying the least are collecting data assets (48%), labelling data (51%) and cleaning and organizing data (60%). This becomes the “trusted data” that data scientists need before they can embark on the exciting elements of developing, training and deploying their models. We can refer to these earlier tasks as ‘Janitorial Tasks’ — necessary but no fun. Data Scientists didn’t get into data science to be wasting time searching across disorganized, random systems to find data. Nor do they desire to spend hours arguing with the IT guys explaining the actual data they requested is not what they were provided with.

Make Artificial Intelligence your Janitor

With data volumes and variety of data expected to grow at an accelerated rate for the foreseeable future, the time spent by data scientists carrying out those janitorial tasks is only likely to increase. Not only is this frustrating to data scientists, it’s inefficient to the organizations hiring the data scientists to deliver business value. So how can a data scientist decrease the time spent carrying out some of those more mundane janitorial tasks? Ironically the answer might be Artificial Intelligence.

Augmenting Data Management with AI is an emerging discipline that uses traditional data management techniques infused with AI to provide an intelligent way to organize your data that allows your data science community to be able to more easily find, understand and extract it so they can get on with the good stuff of using it.

Helping to Accelerate the Journey to AI

Here at IBM we are on a mission to help organizations put in place efficient Data and AI teams, allowing them to reduce the mundane tasks throughout the data and AI lifecycle and focus on delivering value to the business. We refer to this as AI for Governance.

As we explore a typical Data & AI life-cycle we can call out the areas where data scientists benefit from the use of AI for Governance. This is part of what IBM refers to as the “AI Ladder” as show in figure #1 below.

Figure #1: IBM AI Ladder
  1. AI Powered discovery of data. Intelligent discovery of sources of information. Being able to discover and collate enterprise and external data regardless of type or structure and transforming into a standardized format providing consumers a ‘DNS for their data’. Allowing them to easily be pointed to the best data for their purpose.
  2. AI powered profiling, quality and bias detection of the data. Utilizing AI to understand the data that you are cataloguing. Pre trained models that can inspect data elements and determine where sensitive data resides, and which business entities exist across your data landscape. Automatized detection of data quality issues, whilst determining if data could include bias if used in a certain manner.
  3. Intelligent findability of data. Utilize the richness of the metadata in your catalog and use it to train an AI model to help findability of data. Unlock dark data sets and allow AI to recommend them to your users based upon their previous usage of the system.
  4. AI Powered suggestions on data prep operations. Data preparation is a key part of effective data science. AI can assist the data scientist with this preparation. Making suggestions as to which data needs to be wrangled to provide the best set of training data for a model.
  5. Policy Activation on the data. Being able to activate your data protection policies so that data can be masked on the fly, at the point of consumption, dependent on who is using it and what their intent is. Simply put, policy activation puts more data in the hands of your knowledge workers then before whilst helping to ensure your data protection rules are enforced.

Wrapping it all up

Data Scientists love what they do and can be the king makers in an organization’s future. However, they need to be relieved of the mundane tasks that take them away from delivering value and diminish their enthusiasm for their passion.

Data Management capabilities that use AI for governance to better provide organized data, that is easily accessible to the data science community can be a key factor to improve the overall outcome of your AI projects — such as those listed below.

  • IBM Watson Knowledge Catalog provides enterprise catalog and governance capabilities designed to drive productivity from data scientists.
  • IBM Watson Studio provides an enterprise grade data science builder’s environment designed to quickly deliver value through AI
  • IBM Cloud Private for Data provides an end to end data and AI platform with integrated capabilities making your data simple and accessible to power AI at scale.

These are built to help infuse AI into Governance and support the Data & AI Lifecycle, helping organizations to unshackle their data scientists and turn them from spending time as the janitor to being the rock stars of their organization.

Register for our upcoming webinar to learn how AI can be used to underpin your Governance initiatives:

Sign up for a demo here and discover how IBM Watson can transform the way you do Data Science and Analytics.

¹ “2017 Data Scientist Report” from CrowdFlower.

Jay Limburn — Director, Offering Management, IBM Data & AI
Madhu Kochar — VP, IBM Data & AI



Madhu Kochar
Inside Machine learning

VP @IBM, Analytics and Data- Public and Private Cloud. DevOps, Hybrid, Enterprise clients. Opinions are mine.