I’m just officially beginning my journey through the Data Science jungle. It looks exciting. Okay, fine. It looks challenging. To be honest, I am quite intimidated. Don’t get me wrong. I love solving problems and tackling difficult challenges, but Data Science seems like those stars that you can always aim for, but never quite catch.
So, to assuage the situation, I have decided to write to my 9 year old self about what Data Science really is. Since I wouldn’t want to scare her (I know she has had her fair share of scary stuff), I shall explain it simply, and in a (hopefully) fun way.
What is Data Science?
Imagine you were going to join a new school, with new kids, and you would be leaving your friends from your other school. Finding new friends from all those new kids would be quite a tiring task. How would you know which kids, from the new school, were more likely to be your friends?
A clever girl (yes, that’s you), would write down all the characteristics she knew about her current friends. For example:
You can see that all your friends like cartoons, even if a little. All of them, except one, have blue as their favorite color, and most of them love Math. So now we could use that information to find out which of the kids in your new school are most likely to be your friends.
If their favorite color is blue, and they like cartoons, they are extremely likely to be your friend. Still, any kid who loves cartoons is very likely to be your friend, even if their favorite color is burgundy. Their favorite subject doesn’t matter much, but someone whose favorite color is blue, has Math as their favorite subject, and is a cartoon maniac is probably going to make a very good friend.
Now, that’s Data Science right there. It’s getting data, arranging it in a way it can be easily understood, and making decisions out of it. We did several things when trying to get new friends for my 9 year old self:
- Defining the problem: We identified the problem to be that finding new friends from a big number of kids would be tiring.
- Collecting the Data: In this case, we already had the data and only had to write it down. If we didn’t know the favorite colors for all the friends, we would have to go and ask them. All this is part of data collection.
- Processing the data: After getting the data, we need to look at it and see if there is some of it that we don’t need. If so, we remove that data. This is called cleaning the data. For example, we have the names of friends as part of the data. That data is not important when deciding who can be a good friend, since all the names are different. Therefore, we can safely clean out the names because we don’t need them. Sometimes you won’t have all the data you need. For example, if we didn’t have Pauline’s favorite color, we could simply look at all the other favorite colors, and safely determine that it is blue.
- Exploring and Analyzing the data: After cleaning the data, we need to look at the patterns and see if there are any trends. We noticed, for example, that all the friends liked cartoons. This was easy because they were only 5 friends. If we had 1 million friends to look at, it would take us forever to find a trend or pattern. That is where computers would help us. These computers would be given special instructions called algorithms to find patterns in this big data.
- Show the results of the analysis: After finding a pattern, we need to represent it in a way that it can be easily understood. Some common methods for showing this data include: Using tables, graphs and even pie charts.
I hope all this has helped shed some light on what data science is all about.
But where and how is it used?
Data Science is used by Google to help us find exactly what we are searching for on the internet. For example, data science helps Google know that a hot dog is a type of meal, and not a dog with unusually high temperatures.
To fight malaria epidemics in Africa
The Malaria No More project uses Data Science to help reduce the spread of Malaria. Data about the medications of the people living in Malaria prone areas is collected through their mobile phones. This lets scientists track the spread and treatments of Malaria.
There is a lot that Data Science can do for you, and for the world. We only need to know how to use it.