Data Science: An introduction

Hrishikesh Kumar
3 min readMar 9, 2022

--

Data Science

What is Data Science?

Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information from that data, and make business decisions. Data science uses complex machine learning algorithms to build predictive models.

It is a concept to unify statistics, data analytics, machine learning, and NLP-like topics to create insight from a set of data. Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis. Analytic applications and data scientists can then review the results to uncover patterns and enable business leaders to draw informed insights.

But why is Data Science so important?

Because companies are sitting on a treasure trove of data. As modern technology has enabled the creation and storage of increasing amounts of information, data volumes have exploded. It’s estimated that 90 percent of the data in the world was created in the last two years. For example, Facebook users upload 10 million photos every hour.

But this data is often just sitting in databases and data lakes, mostly untouched.

The wealth of data being collected and stored by these technologies can bring transformative benefits to organizations and societies around the world — but only if we can interpret it. That’s where data science comes in.

Data science reveals trends and produces insights that businesses can use to make better decisions and create more innovative products and services. Perhaps most importantly, it enables machine learning (ML) models to learn from the vast amounts of data being fed to them, rather than mainly relying upon business analysts to see what they can discover from the data.

Data is the bedrock of innovation, but its value comes from the information data scientists can glean from it, and then act upon.

Data Science

Okay got it. So what was the difference between AI, Data Science, Machine Learning, and Deep Learning again?

AI means getting a computer to mimic human behavior in some way.

Data science is a subset of AI, and it refers more to the overlapping areas of statistics, scientific methods, and data analysis — all of which are used to extract meaning and insights from data.

Machine learning is another subset of AI, and it consists of the techniques that enable computers to figure things out from the data and deliver AI applications.

Deep learning is a subset of machine learning that enables computers to solve more complex problems.

The Data Science Lifecycle

Data science’s lifecycle consists of five distinct stages, each with its own tasks:

  1. Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This stage involves gathering raw structured and unstructured data.
  2. Maintain : Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data Architecture. This stage covers taking the raw data and putting it in a form that can be used.
  3. Process: Data Mining, Clustering/Classification, Data Modeling, Data Summarization. Data scientists take the prepared data and examine its patterns, ranges, and biases to determine how useful it will be in predictive analysis.
  4. Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative Analysis. Here is the real meat of the lifecycle. This stage involves performing various analyses of the data.
  5. Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision Making. In this final step, analysts prepare the analyses in easily readable forms such as charts, graphs, and reports.

Cool. So what do we need to become a data scientist? We will discuss that in our next post. Stay tuned.

--

--

Hrishikesh Kumar

Data Science Enthusiast || Full Stack Developer || NLP || Machine Learning