# The Complete Reference to Data Science (ML/AI) and Related Concepts

## Handpicked Medium Articles on Data Science and its Nearest-Neighbours

Data is the new oil and AI is the new electricity, the data science (ML/AI) field is evolving rapidly. The field itself is really vast, it's nearly impossible to capture each and every topic in any book or course.

In this post, what I am trying to do is, handpicking quality articles on data science to create a kind of curriculum that can serve as a reference guide for newcomers as well as experienced professionals in the field of data science.

I will keep adding new topics to this post as the field evolves in the near future. So lets get started:

## Table of Content

- Introduction
- Programming
- Mathematics
- Data Analysis and Visualization
- Machine Learning Basics
- Machine Learning Advanced
- Natural Language Processing
- Deep Learning
- Reinforcement Learning
- Data Systems and Big Data
- Cloud Computing
- Advanced Topics

# Introduction

*What, Why, and How of Data Science|Data Science Ecosystem|Roles in Data Science*

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

## What is Data Science?

## Data Science and its Nearest-Neighbours

## Why Data Science Matters?

## How to do Data Science?

## Data Science Ecosystem

## Roles in Data Science

# Programming

*SQL| Python|TensorFlow|Keras*

Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a specific task.

## SQL

## Python

## Tensorflow

## Keras

# Mathematics

*Linear Algebra|Multivariate Calculus|Statistics and Probability*

Mathematics includes the study of such topics as quantity (number theory), structure (algebra), space (geometry), and change (analysis). It has no generally accepted definition.

## Linear Algebra

**Multivariate Calculus**

**Statistics and Probability**

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional, to begin with, a statistical population or a statistical model to be studied.

Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, 0 indicates the impossibility of the event and 1 indicates certainty.

**Statistics**

**Probability**

# Data Analysis and Visualization

*Data|Exploratory Data Analysis|Data Visualization*

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

Data visualization is an interdisciplinary field that deals with the graphic representation of data. It is a particularly efficient way of communicating when the data is numerous as for example a Time Series.

## Exploratory Data Analysis

## Data Visualization

# Machine Learning Basics

*Introduction|Linear Regression| Logistic Regression|Clustering|PCA|SVM*

Machine learning is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence.

## Introduction

## Linear Regression

## Logistic Regression

# Machine Learning Advanced

*Model Selection|Advanced Regression|Decision Trees|Random Forest|Bagging and Boosting|Neural Networks|Time Series*

# Natural Language Processing

*Introduction|Text Processing|Lexical Processing|Syntax and Semantics|*

Natural language processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.

# Deep Learning

*Introduction|FFNN|CNN|RNN/LSTM|Encoder-Decoder|Autoencoders|GANs*

Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

# Reinforcement Learning

*Introduction|Markov DecisionProcess|Optimal Policy Search|Monte-Carlo Learning|Temporal-Difference Learning|TD(λ) and Q-learning*

Reinforcement learning is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

# Data Systems & Big Data

*Data Systems|Evolution of Data Systems|Big Data|Hadoop|Spark*

Data system is a term used to refer to an organized collection of symbols and processes that may be used to operate on such symbols. Any organized collection of symbols and symbol-manipulating operations can be considered a data system.

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

# Cloud Computing

*Introduction|IaaS, PaaS and SaaS|Public, Private and Hybrid Cloud|AWS, Azure and GCP*

Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet.

# Advanced Topics

*MLOps|Explainable AI|Ethics in AI|Data-Driven Business*

## Machine Learning in Production (MLOps)

MLOps is the process of taking an experimental Machine Learning model into a production system. The word is a compound of “Machine Learning” and the continuous development practice of DevOps in the software field. Machine Learning models are tested and developed in isolated experimental systems.

## ML Explainability (XAI)

Explainable AI is artificial intelligence in which the results of the solution can be understood by humans. It contrasts with the concept of the “black box” in machine learning where even its designers cannot explain why an AI arrived at a specific decision.

## Ethics in AI

Artificial Intelligence ethics, or AI ethics, comprise a set of values, principles, and techniques which employ widely accepted standards of right and wrong to guide moral conduct in the development and deployment of Artificial Intelligence technologies.

## Data-Driven Business

The data-driven business puts data and analytics front and center in its business strategy and throughout all echelons. It differentiates itself from the competition by making data-driven optimization part of daily operations.

Creating a data culture is one of the keys to building a data-driven organization. The right technology, data literacy, and disrupting the status quo are ways to start.

Ankit Rathi *is a Principal Data Scientist, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.*