Two important skills for an aspiring data scientist
It was just like another day of life.
I was working on a regression problem for an internet-based company.
I was doubling down on looking up the literature of the domain and trying to figure out what more can be done that will improve the performance.
Doing my best.
This was the problem, I’m trying to solve for many days.
He contacted me and asked: Hey, how’s your work is going on? Any progress?
I replied: Yup, I’ve improved the performance. The MAE(mean absolute error) has been reduced to 10.
He: Uhh..? MAE? What’s that, man? Explain.
Me: The mean of the absolute difference between the predicted values and actual values.
He: Talk to me in human language, man. I don’t understand all of this.
Then, after trying a lot, I’d written an intuitive explanation of MAE and send it to him.
He got that, but still, there was a problem.
He: Okay, Sounds cool! I got it but we’re not going to explain it like this to customers. And not everyone is going to understand it this way. I want you to report something else, that can be understood by anyone in one go.
Me: Okay, I will get back to you soon.
That rings the bell.
Something needs to be figured out and learn from this incident.
Do you get that?
Yup, It’s communicating the insights that you’ve drawn from the data so that everyone can understand it.
See, not everyone is from a technical and mathematical background. They don’t give a shit about it. They don’t care about your 15 layers deep neural network. What all matters is: Have you added value to the table or not, and if yes, then explain to us in plain English.
Nearly every business is all about having more customers and make more money. And that’s what you’re being paid for. So, you need to sit back and figure out what’s the core problem of the business, you’re trying to solve. How it will help the business. And then you need to explain that only in those terms.
One tip from my side.
Train your empathy muscles, jump right into the shoes of another person, picture that persona, and then try to explain your insights.
See, it’s good that you know all of the mathematical concepts and implementing them to solve the problem. But it does not mean a thing if you can’t help the business to make decisions or to make more money.
This is also where Data visualization helps a lot. Data visualization is not about drawing those fancy plots, the sole purpose of data visualization is to communicate your insights using visuals.
Before starting the project
Being a beginner, we’re mostly used to pick up a problem either from kaggle or some random course and then solving that. We know beforehand all of the details that this is the regression problem, this is the dataset you’ve at the end, now you’ve to solve it.
But this is not how things work most of the time. When you’ll work, you’ll be given a business problem and then you’ve to figure out and map it to a machine learning problem.
Quick Example: Suppose, you’re working at Quora and you’ve been said that there are lots of duplicate entries present for a single question that makes the platform hard to use. Can you do something to sort out this problem?
That’s the business problem. Now you’ve understood it thoroughly, if machine learning can help somehow or not, and if yes, then, map it to a machine learning problem.
You can use NLP techniques or deep-learning-based solutions to find out if given two pieces of text are duplicate or not. If yes, remove one of them and if not, keep both of them.
It doesn’t only stop here. You also need to figure out the error metric you’re going to evaluate your model against. The choice of the error metric will guide your journey of model development.
Takeaways:
- Learn to communicate your results using data visualization and plain English effectively.
- Learn to map business problem into a machine learning problem before even starting a project.
If you’ve found it helpful in any way, make sure to help me spread the word, and follow me for more such articles. Peace and Power.