“Machine Learning is Mainly Not Data Science!”
Are we creating an amazing new world of AI, or a rats-nest of problems?
Well, a quote stuck out for me last week from Arun Ghosh at KPMG:
a large part of machine learning is not data science but data engineering
and then he followed up with:
“It’s cleaning and collating and integrating information, and then you run the algorithm. What we are finding is that you can compress the data engineering process by adding a trusted layer that is immutable by nature.”
And I realised that some people in the industry are starting to understand the need to rebuild our systems in a trustworthy way. Many in the industry currently promote the throwing a whole lot of data at AI, and let the machine sort things out. But how can we trust the veracity of the data, and even if it is correct in the first place? Just like in finance, we now need audit trails and the ability to trace back to its source.
Corporations should be now analysing their data and asking serious questions … “Who owns it?”, “Do we have their consent?” “Where did it come from?” “Has it been modified?”, “Can we retrain without using these parts of the data set”, and so on. We need to be able to audit our data sources!