Want to Learn Data Science?
Learn to Think Like a Data Scientist
People interested in data science often ask me how they can be more effective in their use of data science?
My simple answer?
Learn to think like a data scientist.
Learning any field requires that we learn how to think like professionals in that field. And data science is no different.
To be effective in data science, we need to learn how to think like a data scientist.
But it doesn’t just stop there.
Learning how to think like a data scientist means learning how to turn any problem into a data science problem.
But we can’t learn how to turn problems into data science problems without first knowing how to think like a data scientist.
So here’s the secret sauce. Here’s how data scientists think about the world around them.
At its most basic level, data scientists think about the world in terms of probabilities. In other words, the world is not a set of if-then rules but rather a more complex blend of probabilities that can be combined to output even more well-informed probabilities.
Another way to think about probabilities is in terms of uncertainty. Data science takes advantage of our uncertainty about the world by identifying relationships that create patterns that, when combined, can create much more precise expectations about what to expect in the world…probably…
Okay, that’s step 1.
Step 2 to being more effective in turning real problems into data science problems, is learning how to frame real problems in terms of different classes of probabilities.
Probabilities are only as good as they are able to apply to our expectations about what comes next in the world around us. In other words, probabilities are simply predictions. So all data science models are predictions. But we can use more specific model types to derive different types of predictions. And understanding those different types gets us closer to providing tools for framing real problems as data science problems.
What are the types of probabilities in data science?
The simplest and most common types include:
Something like 95% of all real problems that have a potential data science solution, can be reframed as one of the above types of probabilistic problems.
Classification: Does this “observation” (e.g. data points) belong in bucket (e.g. class) A or B?
Regression: How much of some numeric value should I assign to this “observation” (e.g. data points)?
Forecasting: What can I expect the future value to be based on past values?
Clustering: What natural groups exist in this data?
Recommendation: What is the next data point to recommend to this observation?
Association: What two items (e.g. data points) go together most often?
Reinforcement: What associations in an environment predict the most optimal outcome?
If you can reframe a real problem into one of the above terms, then you know what algorithms may apply to build a possible solution to that problem. Effective data science starts with turning this practice into a habit. So as you look at the world, try to turn the problems or barriers you see in your own life into one of the types of probabilistic problems described above. It will help you to develop a more effective data science mindset for business in the future.
Like engaging to learn about data science, career growth, life, or poor business decisions? Sign up for my newsletter here and get a link to my free ebook.