Gaining Insight from Real-World Data Science: 5 Valuable Lessons Learned

5 min readJan 14, 2023

About two years ago, I decided to embark on a career in Data Science, which has been incredibly rewarding. In this article, I will share some of the valuable insights I’ve gained while working as a Data Scientist.

I was fortunate to collaborate with cross-functional and customer-facing teams with Chief Technology Officers (CTOs), Senior Engineering Directors, Product Managers (PMs), and Engineers.

I’m confident that after reading this article, many Data Scientists and Machine Learning Engineers will appreciate the divergence between the real-world approach to Data Science and what is taught in traditional courses.

If you would like to read more of my writing about data science, please visit:

Mistakes to avoid in the data science interview

How you can learn and avoid these mistakes in the future

towardsdatascience.com

Automatic Labelling of Text for NLP

Label text without training any model!

medium.com

Lesson #1. No one cares about the model’s accuracy or F1 score

Sounds absurd, huh? It may seem strange that a data scientist spends so much time and energy perfecting a model in pursuit of a higher F1 score or accuracy (or any other metric you care about), yet in the end, no one pays any attention to it.

Regardless, in the real world, the stakeholders care about “Does the model bring value to the business?”

Data Scientists need to map business metrics 🌎 to ML model metrics 📈

An example of a B2C would be:
Business Metric: Customer Acquisition Cost (CAC)
Data Science Model Metric: Classification accuracy 📊 (to determine how accurately the model can predict if a customer will convert)

Lesson #2. No one cares which ML or DL model you use

If you are tempted by State-of-the-Art (SOTA) models like OpenAI’s GPT3 or chatGPT, you are not alone! However, businesses tend to look more closely at the practical results that can be achieved with your model.

Find the solution that makes an impact. Photo by Samantha Lam on Unsplash

It doesn’t matter how much knowledge you have of intricate algorithms in the Data Science field — what counts is how well you can generate (or save) revenue💰 for the company and get work done✔️

P.S. The expense of maintaining the infrastructure necessary for supporting these large, complex models can impact their real-world applications. To give an example, the cost of running ChatGPT is around $100,000 per day 🚀

Lesson #3. Learn and follow software engineering practices

Many of the readers and fellow Data Scientists may not have a formal CS degree 📚 or Software Engineering background, hence writing a clean, efficient, and maintainable scripts could be a challenging task for us.

Here are a few advantages of adhering to standard software engineering practices:

Improved Code Quality: Practice of writing modular, reusable, and maintainable code that minimizes the need for bug fixes and reduces time spent debugging.
Reproducibility: By organizing the code following a framework and creating a clear development pipeline, code is more easily tested, shared, and reused.
Collaboration: A well-structured codebase reduces confusion and the potential for conflicts between team members.

However, your journey to learn could start with something as simple as:

Learning Git to version control your code,
Using IDE or Code Editors like VS Code,
Learning to write clean code (Style Guide for Python PEP8),
Write more documentation,
Practicing code reviews,
Create and write better Pull Requests

Lesson #4. Do not reinvent the wheel!

So instead of reinventing the wheel and building something from scratch, consider taking advantage of open source. We should check to see if there’s any easier route or if someone has a better solution you can use/repurpose.

Familiarize yourself with as many tools as possible. Photo by Cesar Carlevarino Aragon on Unsplash

Try picking up a lot of Data Science clichés and automating them. For example, I use Sweetviz for starting EDA with just a few lines of code!

You could also experiment using automated data quality reports with Evidently and Great Expectations. This way, you don’t have to write code from scratch and can save valuable time.

Lesson #5. Avoid using technical jargons in the client meetings

Businesses care about simplicity. And simple language that proves why you are right is crucial to the success of our hours/weeks of work.

Now according to Wikipedia, “Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that were retrieved”.

Yeah, I bet your non-technical stakeholders 👔 loved this definition 😑. Photo by Sander Sammy on Unsplash

Now, what if we could rephrase Precision and Recall as follows?

Imagine we are fishing 🐟 with a net. We use a wide net and catch 80 of 100 fish in the lake. That is 80% recall. But we got 80 rocks in the fishing net. That means 50% precision as half of the net’s content is junk.
Now, we decide to use a smaller fishing net and target an area with lots of fish and no rocks. We get 20 fish and 0 rocks.
That is 20% recall and 100% precision 👏

Learn how to explain results in a clear and layman manner to a non-technical audience. A person who can teach AI in simple-terms to business people can create a lot of value 🙌

Closing Thoughts 💡

The application of data science varies greatly amongst businesses, and each data scientist has a unique perspective on the subject. I hope this blog gave you new perspectives to help in your career.

Consider sharing this post with your network and leave a comment to share what lessons you’ve gained over the years.

Please feel free to follow or connect on LinkedIn and book 1:1 on Topmate. Till then, see you in the next post :) Thanks for reading. Take care.

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai