Data Science 101

The concepts you need to know before entering the Data Science world

Piu Mallick
DataSeries
7 min readJul 31, 2019

--

I was playing around with data and then I found the Science — Yes, my introduction to the world of Data Science has been a part of my research work that I did solely for my pleasure. I used to work as a Database Developer in a reputed IT firm when my daughter was born and I decided to take a hiatus from work to make her my priority. However, whenever I was at leisure, I was always surfing the web, poking around in my friend circle to keep updating myself with the latest technologies and all the happenstance in the tech world. And not surprisingly I got introduced to quite a few amazing concepts and in them the newer inventions around Data Science captivated me and the thrust for more kept me going on.

And like me, if you are starting at Data Science, looking for resources that can give you a jump start or at least a better understanding at it or you have just heard/read the term being coined and want to know what it is, off-course you can find a gazillion materials about it, this is, however, how I started and got familiar with the basic concepts. :)

What do you mean by the term ‘Data Science’?

Data Science provides a meaningful information based on larger amount of complex data or big data. Data Science, or if you would like to say Data-Driven Science, combines different fields of work in statistics and computation to interpret data for decision-making purposes.

Understanding Data Science

How do we collect data? — Data is drawn from different sectors, channels and various platforms including cell phones, social media, e-commerce sites, various healthcare surveys, internet searches and many more. The surge in the amount of data available and collected over a period of time has opened the doors to a new field of study based on big data — the huge and massive data sets that contribute towards the creation of better operational tools in all the sectors.

The continuous and never-ending access to data has been made possible due to advancements in technology and various collection techniques. Numerous data patterns and behavior can be monitored and predictions can be made based on the information gathered.

In technical terms, the above stated process is defined as Machine Learning; in layman’s terms, it may be termed as Data Astrologypredictions based on data.

Nevertheless, the ever-increasing data is unstructured in nature and is in constant need for parsing in order to make effective decisions. This process is really complex and very time-consuming for the organizations — and hence, the emergence of Data Science.

A Brief History / Background of Data Science

The term ‘Data Science’ has been into existence for about three decades now and was originally used as a substitute for ‘Computer Science’ in 1960s. Approximately 15–20 years later, the term was used to define the survey of data processing methods used in different applications. 2001 was the year when Data Science was introduced to the world as an independent discipline.

In fact in the year 2012, the Harvard Business Review had published an article stating the role of a Data Scientist as the “Sexiest Job of the 21st century” and Data Science has been the buzzword ever since.

Disciplinary Areas of Data Science

Data Science incorporates tools from multiple disciplines in order to gather a data-set, process and derive insights from the data-set and interpret it appropriately for the decision-making purposes. Some of the disciplinary or noteworthy areas that make up the Data Science field include Data Mining, Statistics, Machine Learning, Analytics and Programming, of course, and the list goes long and never-ending. But, we would be doing a brief discussion mainly on the aforesaid topics as the concept of Data Science mainly revolves around these basic concepts, just to keep it simple.

Data Mining applies algorithms to the complex data-sets to reveal patterns that are then used to extract useful and relevant data from the set.

Statistics or Predictive Analysis use this extracted data to gauge events that are likely to happen in future based on what the data shows happened in the past.

Machine Learning can be best described as an Artificial Intelligence tool that processes massive quantities of data that a human is incapable of doing in a lifetime — it perfects the decision model presented under predictive analytics by matching the likelihood of an event happening to what actually happened at a predicted time in the past.

The process of Analytics involves the collection and processing of structured data from Machine Learning stage using various algorithms. The data analyst interprets, converts and summarizes the data into a cohesive language that the decision making team can understand.

Programming forms the backbone of the Software development. Data Science is a combination of several fields including Computer Science. It involves the usage of scientific processes and methods to analyze and draw conclusions from the data. Hence, specific programming languages are designed for this role — Python, R, SQL, Scala, SAS, Julia, Tableau, Qlik, Azure deserve a special mention. All the above mentioned processes need some or the other programming language for the desired result.

Data Science Across Different Industries

Data Science — in Today’s World

Organizations are in constant attempt of applying big data & data science to almost all the everyday activities to bring value to the consumers.

Banking institutions are capitalizing on big data to enhance on their fraud detection techniques.

Asset management firms are using big data to predict the likelihood of the moving up and down of the security price at a particular stated time.

Entertainment & mass media companies like Netflix incorporates data mining to determine what products to deliver to its customers. Not only this, Netflix uses specialized algorithms to create personalized recommendations for customers/viewers based on their viewing or watch history.

These are some of the mention-worthy application areas of data science, but the list goes never-ending. Data science is continuously evolving at a very rapid rate, and its applications have already and will change lives into the future.

Job of a Data Scientist

Literally speaking, the job of a Data Scientist is multi-tasking: he/she collects, analyzes and interprets massive amount of structured and unstructured data, and in maximum number of cases, to improve an organization’s operations. Data Science professionals develop statistical models that analyze data and detect patterns, trends and various relationships in data sets. This vital information can be used to predict consumer behavior or to identify business and operational risks. Hence, the job of a Data Scientist can be described as a story-telling which presents data-insights to the decision-makers in a way that is understandable and applicable to problem-solving. The role of a Data Scientist is becoming increasingly important as businesses rely more heavily on data analytics to drive decision-making and lean on automation and machine learning as core components of their IT strategies.

Well, it might appear to others(especially the non-technical people) that the job of a Data Scientist is no different from other jobs and still they take home a lucrative compensation package. But believe me, being a Data Scientist is not an easy task — they are the fortune tellers of the organizations and they should do their analysis right. After all, as mentioned earlier, the job of a Data Scientist has been declared as the sexiest job (by Harvard Review) not long ago.

Key Takeaways

  1. Advancements in technology, internet, social media and their use have all contributed to the increased access to the Big Data.
  2. Data Science uses the techniques such as Machine Learning(ML) and Artificial Intelligence(AI) to extract meaningful information and predict future patterns and behaviors.
  3. The field of Data Science is experiencing a gradual growth as the technology advances and big data collection and and analytical techniques become more sophisticated.

Present & Future of Data Science

Data Science has become the real thing now and there are potentially hundreds and thousands of people running around with that job title. And, we too have started seeing these Data Scientists making large contributions to their organizations. There are certainly challenges to overcome, but the value of data science from a business point of view is pretty clear at this point.

As aptly said, “Necessity is the mother of invention” — likewise future needs open the doors for more research work in the field of data science. For example, you can see in the video below the concept of ‘Democratizing AI’ (the power of mixed reality) by overcoming the barriers of communication — language and transport and it is absolutely a game changer.

Now, thinking about the future, certain questions definitely arise —

→ “How will the practice of data science be changing over the next five years? What will be the new research areas of data science?”

→ “Will we still be using that title? Or will we all be AI monkeys, or something else?” We have to wait and see what future awaits for us. ;)

→ “Will the fundamental skills will remain the same?”

These are certainly debatable questions, but one thing is for sure — inventions has happened and will continue to happen when there arises any demand or for the betterment of the future. And, the world would keep benefiting from data science through its upcoming innovations.

Please clap if you liked reading the article. It definitely gives a sort of encouragement for writing more. I would be sharing more on Data Science concepts in my upcoming articles and try to make it as simple as possible, so that even a non-technical person can get the taste of it.

--

--