Comparing DS, ML, DL and AI

dilip.rajani
8 min readFeb 29, 2020

--

CONTENTS

Preface

Introduction

What is Data Science?

What is Machine Learning?

What is Deep Learning?

What is Artificial Intelligence?

Comparing Data Science (DS), Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI)

Summary

Preface

The main idea behind this blog is to compare and relate Data Science, Machine Learning, Deep Learning and Artificial Intelligence based on their definitions.

To set things clear, let us define each of the big fancy terms, and break them down to understand what each of the components mean in the definitions, briefly.

Introduction

In this era, terms like data science, machine learning and artificial intelligence are used interchangeably. Even an organization offering new technologies mix up operations/processes involved within these methods, and may talk about their high-end data science techniques without having much knowledge about them. Let us define and relate these terms.

What is Data Science?

Data Science is a systematic and scientific approach that defines processes involved to extract knowledge and insights from structured or unstructured data.

Structured and Unstructured data

In this digital era, the complexity of big data has presented both an opportunity and challenges to the organizations. It is overwhelming for most organizations to process and analyze data, which is being generated at an exponential rate. Statistically, 80% accounts of unstructured data being produced at around 45% Exabytes per year whereas, structured data constitutes 20% being created at around 25% Exabytes per year.

Growth of Structured vs Unstructured Data

Structured data is highly organized and formatted that can be easily analyzed and processed through relational databases. Structured data is most often categorized as quantitative data. Examples of structured data include gender, addresses, credit card numbers, stock information, geolocation, and more.

Unstructured data has no pre-defined format, making it much more difficult to collect, process, and analyze using conventional tools and methods. It is most often categorized as qualitative data that is conceptually stored in Data Lakes (NoSQL databases). Examples of unstructured data include text, video, audio, social media, satellite imagery and the list goes on.

Systematic and Scientific approach

A systematic and scientific approach is process where theories and methodologies are practiced on complex problems to explore observations and predict outcomes.

Theories are set of assumptions, principles and relationships for dataset to explain a certain phenomenon whereas, methods include observations, interviews & surveys, research and experiments. This model suggests that the outcome from testing the hypothesis or theory helps in deciding the appropriate actions to be taken.

Systematic and scientific approach steps

Extract Knowledge and Insights

While the raw data is being processed using the systematic and scientific methodologies, it yields out information. When enough information is experienced (study and research), it becomes knowledge, and it is directly proportional to the quantity of processed data.

On the flipside, insights are acquired when knowledge is aligned and observed with the problem statement in action. Moreover, iteratively, the experienced insights further evolve our knowledge.

This churning of data producing insights is what desired and practiced by organizations to help them make effective decisions. Additionally, these insights (on data) are deduced by applying descriptive and inferential techniques.

Briefly, Descriptive statistics uses the data to provide descriptions of the population, either through numerical calculations, tabulations, or simple visualizations techniques. Whereas, Inferential statistics makes inferences and predictions on a sample of data taken from the population in question

What is Machine Learning?

Machine Learning is process that starts with learning from data, finding patterns using algorithm and statistic techniques that help take actions with minimal human intervention.

Learning from Data

Learning (from data) is the process of gaining new, or modifying existing knowledge through research, study or experience. It is an iterative process of feeling/experiencing -> watching -> thinking -> doing.

Applying this basic learning technique to data can help extract variety of insights that assist us in measuring data quality, giving scope and context to dataset, analyzing and modelling data, hypothesizing problem statement, reporting success metrics.

It is important to note that David Kolb’s learning model could be an ideal model in the applications of advanced machine learning including Artificial Intelligence (AI)

Finding Patterns

Finding patterns one of the essential step in Machine Learning. Unless any definite patterns are observed from the data, it is very difficult to frame such a model for prediction. It is an ability to identify the characteristics of data that yield information about a given dataset.

Patterns occur at regular intervals and repeats itself in a predictable manner. Once they are detected in a dataset, it becomes easier to classify and segregate the data for analysis.

Steps involved in finding patterns

Algorithm and Statistic Techniques

Algorithm is a finite sequence of well-defined instructions to solve a problem or perform a computation. And, an algorithmic technique is a general approach for implementing a process or computation. One of the most essential aspects of an algorithm is its performance. It assists in optimizing a process according to the available resources.

Statistics is the discipline that concerns the collecting, organizing, analyzing, interpreting and presenting data. Statistical techniques are helpful in providing insights about data. For example, extreme values, mean, median, standard deviations are useful in exploring, summarizing, and visualizing data.

Statistics is mainly branched into two categories

Categories of Statistics

Machine Learning is also referred as Applied Statistics or Statistical Learning. The major difference between machine learning and statistics is their purpose. Machine learning models are designed to make the most accurate predictions possible. Statistical models are designed for inference about the relationships between variables.

What is Deep Learning?

Deep learning is a machine learning technique that teaches computers to learn and improve by example. Deep learning is based on the concept of our brain cells called neurons that function to process and transmit information in the complex circuit, and eventually, our brain directs an action. Scientifically, taking this analogy to deep learning, neurons are termed as nodes; the complex circuits are called as Artificial Neural Network (ANN); action is the predicting outcome.

Nodes

A node or neuron is a computational unit that has one or more weighted input connections, a transfer function that combines the inputs logically, and an output connection. Nodes are then organized into layers to comprise a network, which is called as Artificial Neural Network.

X1 and X2 are numerical inputs; w1 and w2 are weights; 1 input with weight b as Bias

Artificial Neural Network

Artificial neural network (ANN) consist of layers of nodes. Nodes within individual layers are connected to adjacent layers. The network is said to be deeper based on the number of layers it has. In an artificial neural network, data/information travels between nodes and assign corresponding weights. The final layer compiles the weighted inputs to produce an output.

One of the main crux of ANN is Activation Function. Activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The purpose is to introduce non-linearity into the output of a node as non-linear functions accept the complex nature of neural networks.

Artificial Neural Network

The ability to process large numbers of inputs (features) makes deep learning very powerful when dealing with unstructured data. However, deep learning algorithms can be overkill for less complex problems because they require access to a vast amount of data to be effective

Outcome

Each nodes in the individual layers generate an output that is proportional to node’s weight, which in turn becomes the input for the next layer, and so on. This happens repeatedly until the outcome is accomplished. The actual outcome is then compared with the expected outcome, and the network performance is evaluated. If the difference between actual and expected outcome is large then, an ANN technique called as backpropagation is implemented that adjusts the weight of the nodes to minimize this difference.

What is Artificial Intelligence?

Artificial intelligence (AI) is the ability of a machine or a computer program to think, act and learn like humans. AI is accomplished by studying how human brain thinks, acts and learns while trying to solve a problem, and then using the outcomes of this study, intelligent software and systems are developed. Some of the areas where AI has its prominence are depicted in the figure below

Areas of AI

Comparing Data Science (DS), Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI)

After the brief introduction of each of these pillars, let us understand the commonalities them. AI ML DL are not part or subset of DS, although, certain tasks involved in DS intersect with AI, ML and DL. DS is a data-driven technique and each of DS, ML and DL have processes that relate to data or big data, contextually.

Linking of DS, ML, DL, AI

DS, ML, DL and AI need loads of data to begin with. Each of these processes the data in their own context and techniques and provide an outcome, which is then examined on human interests. A point to note is, DS, ML, DL and AI are iterative techniques that is, if the actual outcome has large variance with the expected result, the respective processes are repeated.

Comparison overview — DS, ML, DL, AI

If observed carefully, the processes contain the similarities between DS, ML, DL and AI. Linkage above two figures, following inferences can be made

Point of reference: DS

  • DS intersects with ML on the grounds of cleaning and modelling the data
  • The cleaning and munging steps of DS help is feature extraction and grouping of information in DL
  • The interference and decision making phases are advanced version of describing and modelling steps of DS

Point of reference: ML

  • Cleansing and preparing data in ML is the precursor step of extracting features in DL; also, the evaluation provided in ML acts as a prototype for evaluating the performance of neural network in DL
  • Simply put, AI is applied based on ML or ML can be called as subset of AI.

Point of reference: DL

  • The evaluation and self-improving stages of DL assists in developing superior AI models.

Summary

Although, DS provides modelling techniques to harness data through statistics and algorithms, eventually, extract features, and yield out information. Moreover, ML, DL and AI develop the functional solution in their own domain using these techniques. But, the main difference is the fact that DS covers the entire spectrum of data processing, and not just the algorithmic or statistical aspects of it. In particular, DS also covers

  • data integration
  • distributed architecture
  • automating machine learning
  • data visualization
  • dashboards and BI
  • data engineering
  • deployment in production mode
  • automated, data-driven decisions

--

--