The Explanations of Data Science, Machine Learning, Artificial Intelligence and Deep Learning.

Betül Elif ÖZMEN
9 min readNov 16, 2022

--

First of all, what are the differences between Machine Learning, Data Science, Deep Learning and Artificial Intelligence? These are the simple definitions of them:

Data Science(DS): Data Science integrates all the above terms — AI, ML & DL to extract insights from data and make predictions from large datasets. *Note that the distinctions between these terms aren’t clear-cut.

DS, AI, ML, DL

Artificial Intelligence(AI): A program that can sense, reason, act and adapt. Programs with the ability to learn and reason like humans.

Machine Learning(ML): Algorithms whose performance improves as they are exposed to more data over time.

Deep Learning(DL): Subset of ML in which multilayered neural networks learn from vast amounts of data.

Chart of AI, ML and DL

TYPES OF ARTIFICIAL INTELLIGENCE (AI)

Artificial Intelligence can be divided based on capabilities and functionalities.

There are three types of Artificial Intelligence-based on capabilities.

  • Narrow AI
  • General AI
  • Super AI

Under functionalities, we have four types of Artificial Intelligence.

  • Reactive Machines
  • Limited Theory
  • Theory of Mind
  • Self-awareness

TYPES OF MACHINE LEARNING (ML)

Machine Learning is often categorized by how an algorithm learns to become more accurate in its predictions. There are four basic approaches:

  1. SUPERVISED LEARNING: that’s defined by its use of labelled datasets. Using labelled inputs and outputs, the model can measure its accuracy and learn over time. There are two main tasks:
  • Classification (Binary and Multi-class Classification ): Dividing data into two categories and the other classification which is multi-class classification is choosing between more than two types of answers.
  • Regression : Regression is another type of supervised learning method that uses an algorithm to understand the relationship between dependent and independent variables. Regression models are helpful for predicting numerical values based on different data points, such as sales revenue projections for a given business.

Example of Supervised Learning Algorithms:

  • Linear Regression
  • Logistic Regression
  • Nearest Neighbor
  • Gaussian Naive Bayes
  • Decision Trees
  • Support Vector Machine (SVM)
  • Random Forest

2. UNSUPERVISED LEARNING: Unsupervised learning uses machine learning algorithms to analyze and cluster unlabeled data sets. These algorithms discover hidden patterns in data without the need for human intervention (hence, they are “unsupervised”). There are four main tasks:

  • Clustering: Splitting the dataset into groups based on similarity.
  • Association: Identifying sets of items in a data set that frequently occur together.
  • Dimensionality Reduction: Reducing the number of variables in a data set.
  • Anomaly Detection: Identifying unusual data points in a data set.

3. SEMI-SUPERVISED LEARNING: This approach to machine learning involves a mix of the two preceding types. Semi-supervised learning is ideal for medical images, where a small amount of training data can lead to a significant improvement in accuracy. For example, Machine Translation, Fraud Detection, and Labelling Data.

4. REINFORCEMENT LEARNING: typically use reinforcement learning to teach a machine to complete a multi-step process for which there are clearly defined rules. Reinforcement is often used in areas such as Robotics, Video gameplay, and Resource Management.

TYPES OF DEEP LEARNING (DL)

Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

Here is the list of the top 10 most popular deep learning algorithms:

  1. Convolutional Neural Networks (CNNs)
  2. Long Short Term Memory Networks (LSTMs)
  3. Recurrent Neural Networks (RNNs)
  4. Generative Adversarial Networks (GANs)
  5. Radial Basis Function Networks (RBFNs)
  6. Multilayer Perceptrons (MLPs)
  7. Self Organizing Maps (SOMs)
  8. Deep Belief Networks (DBNs)
  9. Restricted Boltzmann Machines( RBMs)
  10. Autoencoders

Overview of An End-To-End Data Science Project( Steps, Libraries, IDE, Programming Languages, Datasets, etc.)

An end-to-end data science project concludes different steps. General using steps are shown on the below chart.

Data Science Lifecycle
Data Science Lifecycle

A list of the best Python IDE for data science and machine learning projects: Spyder, JupyterLab, PyCharm, Visual Code, Thonny, Atom.

For data science projects, several different programming languages are used like Python, R, SQL, Java, Julia, Scala, C/C++, JavaScript, Swift, Go, MATLAB and SAS (Are written by some of the top data science programming languages for 2022.) Some of them are used in computer programmes to implement algorithms. *In this article, libraries or other explanations that are generally used for Python are included.

In the data collection part, there are multiple ways of gathering data for instance online data sources, pulling data with API or accessing data in databases. One of the popular ways is online data sources which someone can access and download for free for data science projects. The other way to gain data is using requests and building an automated data pipeline between a website and the requester targeting a specific part of the website content. API (Application Programming Interface) helps. Data can be pulled on an automated schedule or manually on demand.

API

Here is a list of a few sources for datasets: Online Is Plural, Buzz Feed, NASA, AWS Public Data Sets, Wikipedia, Quandl, Data.World, World Bank Open Data, Kaggle, Google Finance, UNICEF publications, Our World in Data, Google Public Data Explorer, Five Thirty-Eight, Socrata, Github, UCI Machine Learning Repository, Data.gov, Academic Torrents, Nasdaq Data Link, Twitter, Youtube, wunderground, global health organization, pew research center, national climatic data center.

The most known is Kaggle which is a popular online platform with over 50,000 public datasets on a wide range of topics, can find easily all the data and code.

As another method, SQL (Structured Query Language) is a special-purpose programming language for managing data and accessing data in databases. With SQL, you can easily manage and seamlessly analyze large amounts of raw data. *Note that: Python is a general-purpose scripting language. SQL is a query language.

The main difference between SQL and Python is that developers use SQL to access and extract data from a database, while developers use Python to analyze and manipulate data by running regression tests, time series tests, and other data manipulation calculations.

For Machine Learning and Deep Learning projects, some libraries are used commonly. These are depending on their purposes.

  • TensorFlow
  • NumPy
  • SciPy
  • Pandas
  • Matplotlib
  • Keras
  • SciKit-Learn
  • PyTorch
  • Scrapy
  • BeautifulSoup

For data visualization, Matplotlib, Seaborn, ggplot, Plotly, and Bokeh libraries can be used in Python.

Power BI and Tableau are useful tools primarily used by data scientists and business analysts to extract valuable information from raw datasets and use it for business. These are a collection of various Business Intelligence and data analytics tools that allows the user to collect data from varied sources in both structured and unstructured format and convert that data into visualizations and other insights.

These tools save up a lot of time for Data Scientists by generating appealing visualizations in lesser time and without coding. Exploratory Data Analysis(EDA) is important for Data Science processes. A Data Scientist needs to be able to quickly visualize the data they’re dealing with before creating the model, and Tableau helps with that. But a disadvantage of these tools for data science is their’s visualization cannot be integrated into the platform.

Data Cloud Platforms: As data scientists deal with solving complex business problems through building models and deploying algorithms, the right kind of tools become essential to effectively manage different aspects of a project pipeline. Taking your data science projects to the cloud comes with advantages like the ability to scale, access to all the latest tools, and less maintenance from the user side. Some of the most common cloud-based platforms for data science projects include Amazon Web Services, Google Cloud Platform, IBM Watson and Microsoft Azure.

An operating system (OS) is system software that manages computer hardware, and software resources, and provides common services for computer programs. Examples of operating systems are Microsoft Windows, Mac OS X, GNU/Linux, BeOS, Android and IOS. Data Science can bring some difficulties so choosing an operating system is important for that.

With so much data being generated every day, it’s becoming increasingly difficult to manage using traditional methods. This has led to the development of multiple frameworks and technologies to help with the management and processing of big data. There are many different technologies that you can use to build a modern data infrastructure. Three of the most popular big data frameworks from the Apache Software Foundation: Apache Hadoop, Apache Spark, and Apache Kafka.

I tried to explain the definitions and sub-breakdowns for Data Science, Machine Learning and Artificial Intelligence with the help I got from the sources(mentioned in the references section). I tried to talk about the applications that can be encountered in a data science project or in this sector. In my next article, I will try to give examples from the industry. I wish everyone a good day.

REFERENCES

https://en.wikipedia.org/wiki/Deep_learning

https://www.datacamp.com/blog/top-programming-languages-for-data-scientists-in-2022

https://analyticsindiamag.com/how-to-use-cloud-platforms-for-your-data-science-projects/#:~:text=Some%20of%20the%20most%20common,IBM%20Watson%20and%20Microsoft%20Azure.

--

--