5 Things You Won’t Learn ONLINE about Data Science

Yash Gupta
Data Science Simplified
6 min readOct 6, 2022

Like a million others out there, my journey in Data Science started with online learning. Being self-taught is something I wear on my sleeve and am proud of it, but there’s no denying that there are a couple of things you will never learn online about Data Science.

Being close to data, regularly, and working on it from end-to-end, i.e., right from getting the data to answering an analysis or automating a model, there’s a lot you will learn on the job. These things can not be taught online simply because they are out of the scope of things that you can remember theoretically but are things you need to note and consider when working with data practically.

Without any more ado, let’s dive right into 5 things that I think you won’t learn online about Data Science.

  1. Cost Reduction
  2. Reproducibility
  3. Technical Communication & Interpretability
  4. The 4 timer approach
  5. Data-Driven Decision making

Cost Reduction:

What no one mentions in online courses is that Data Science is expensive. It takes a lot of investment to maintain your databases and store data and once that is in place, analyzing it is a completely different story.

It is essential to know that anything you do should be done in a way to reduce costs as much as possible. If you can use lesser data and work with a sample instead of the entire dataset or if you can work with tools that use compressed datasets, you should go for them.

For example, if I had to store 100000 rows of data, I wouldn’t get myself a server to store it but rather just go with it in Excel. Why waste your resources or your organization’s resources?

At the end of the day, it comes down to what is the most ‘effective’ and ‘efficient’ way of going about your business in data.

Reproducibility:

Reproducibility is another thing that organizations fail to talk about in their online courses. Have you ever looked at a dataset and wondered if you could use an older chunk of code that you used for a completely different analysis again? That’s reproducibility for you.

In companies, you often come down to a point where you have to repeat a kind of analysis over time and this can be due to a lot of reasons — updated data, different ways of measuring a column, etc.

That being said, making sure you are using dynamic variables and reproducible chunks in your analysis and work is one of the most important things to do. Ideally, when you change the source data, your entire analysis should update and not go haywire.

P.S. Imagine changing the manual entry of values of the number of weeks you have to run an analysis on because you simply did not use a variable for it. (You know the pain if you have used python or R before)

Technical Communication & Interpretability:

This often misses out in a technical workplace too when you start in a new company or team or just about with a new client. It is important to understand the balance between technical communication and interoperability to ensure that the specifics of your work are communicated AND understood by the other person.

Often, we forget that in a bid to explain a lot of our work, we dwell so much on the details that we forget that a simple, word-to-word definition of our project that communicates what exactly has happened, is much better than confusing the other person.

This is what data storytelling addresses, knowing how to weave your scenario in a way that is simple to decipher and is easier to remember for the larger audience or just about the stakeholders is all it takes to ensure data-driven decisions are made and that time is not wasted in the technicalities.

For example, I wouldn’t know what a logistic regression algorithm is doing but I would be able to understand that this algorithm can predict with 80% accuracy if a customer is going to churn or not and that is what matters more to me (if I trust you as the genius behind the model, of course)

The 4-timer approach:

Now, this is something I have made up on my own. You won’t find this exact way of interpreting the approach if you google it. So here it goes.

Just like there are 4 directions, there are 4 different approaches you can take to get to a conclusion in a particular analysis if it is broad enough. These approaches can be a different related variable to analyze something, for example; if you had to calculate how physically fit someone is, you could check their BMI or just the amount of fat they have in their body (for a large enough percentage you could say that they are not physically fit)

In any case, the 4 timer approach works for me as follows (and you can skip it if you think it won’t work out for you, no pressure);

Step 1: Going the straightforward way and completing your analysis
Step 2: Trying to explore the features involved and get a more specific output
Step 3: Tweaking the analysis and making sure that it does not just ‘answer’ something but also ‘addresses’ the problem
Step 4: Reiterating the analysis in a way that is reproducible, interpretable, and cost-effective

Data-Driven Decision making:

It is by far the most important thing that misses out on online courses all the time. Driving decision-making with data. It’s great that your model works well and that it gets around 90% of the predictions right. You now know who will churn out of your company or move to a new brand in the following year based on historical data… but does your job as the data scientist end there?

Absolutely not.

You must be a part of the decision-making process in an active way to give insights to the executives who drive the decisions for growing the company in the right direction. As your stakeholders, they will only see the perspective you present to them about a particular situation from one side.

Knowing the data as a whole is your job and knowing what impacts your analysis and what should improve to change the situation is something only you can answer. This is where many companies go wrong, someone who has not been in touch with the data for enough time, goes on to make the decisions, which cannot be the case to maximize growth.

Let me know in the comments below if you think that there are other concepts that some online courses miss out on.

Leave a clap and follow to stay in touch with any new articles and to support the blog!

For more such articles, stay tuned with us as we chart out paths on understanding data and coding and demystify other concepts related to Data Science. Please leave a review down in the comments.

Check out my other articles at:

Contact me on LinkedIn at — Yash Gupta — if you want to discuss it further! Leave a clap and comment below to support the blog! Follow for more.

--

--

Yash Gupta
Data Science Simplified

Lead Analyst at Lognormal Analytics and self-taught Data Scientist! Connect with me at - https://www.linkedin.com/in/yash-gupta-dss