Geek Culture
Published in

Geek Culture

The Complete Reference to Data Science (ML/AI) and Related Concepts

Handpicked Medium Articles on Data Science and its Nearest-Neighbours

Data is the new oil and AI is the new electricity, the data science (ML/AI) field is evolving rapidly. The field itself is really vast, it's nearly impossible to capture each and every topic in any book or course.

In this post, what I am trying to do is, handpicking quality articles on data science to create a kind of curriculum that can serve as a reference guide for newcomers as well as experienced professionals in the field of data science.

I will keep adding new topics to this post as the field evolves in the near future. So lets get started:

Table of Content

  • Introduction
  • Programming
  • Mathematics
  • Data Analysis and Visualization
  • Machine Learning Basics
  • Machine Learning Advanced
  • Natural Language Processing
  • Deep Learning
  • Reinforcement Learning
  • Data Systems and Big Data
  • Cloud Computing
  • Advanced Topics


What, Why, and How of Data Science|Data Science Ecosystem|Roles in Data Science

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

What is Data Science?

Data Science and its Nearest-Neighbours

Why Data Science Matters?

How to do Data Science?

Data Science Ecosystem

Roles in Data Science


SQL| Python|TensorFlow|Keras

Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a specific task.






Linear Algebra|Multivariate Calculus|Statistics and Probability

Mathematics includes the study of such topics as quantity (number theory), structure (algebra), space (geometry), and change (analysis). It has no generally accepted definition.

Linear Algebra

Multivariate Calculus

Statistics and Probability

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional, to begin with, a statistical population or a statistical model to be studied.

Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, 0 indicates the impossibility of the event and 1 indicates certainty.



Data Analysis and Visualization

Data|Exploratory Data Analysis|Data Visualization

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

Data visualization is an interdisciplinary field that deals with the graphic representation of data. It is a particularly efficient way of communicating when the data is numerous as for example a Time Series.

Exploratory Data Analysis

Data Visualization

Machine Learning Basics

Introduction|Linear Regression| Logistic Regression|Clustering|PCA|SVM

Machine learning is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence.


Linear Regression

Logistic Regression

Machine Learning Advanced

Model Selection|Advanced Regression|Decision Trees|Random Forest|Bagging and Boosting|Neural Networks|Time Series

Natural Language Processing

Introduction|Text Processing|Lexical Processing|Syntax and Semantics|

Natural language processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.

Deep Learning


Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.

Reinforcement Learning

Introduction|Markov DecisionProcess|Optimal Policy Search|Monte-Carlo Learning|Temporal-Difference Learning|TD(λ) and Q-learning

Reinforcement learning is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Data Systems & Big Data

Data Systems|Evolution of Data Systems|Big Data|Hadoop|Spark

Data system is a term used to refer to an organized collection of symbols and processes that may be used to operate on such symbols. Any organized collection of symbols and symbol-manipulating operations can be considered a data system.

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

Cloud Computing

Introduction|IaaS, PaaS and SaaS|Public, Private and Hybrid Cloud|AWS, Azure and GCP

Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet.

Advanced Topics

MLOps|Explainable AI|Ethics in AI|Data-Driven Business

Machine Learning in Production (MLOps)

MLOps is the process of taking an experimental Machine Learning model into a production system. The word is a compound of “Machine Learning” and the continuous development practice of DevOps in the software field. Machine Learning models are tested and developed in isolated experimental systems.

ML Explainability (XAI)

Explainable AI is artificial intelligence in which the results of the solution can be understood by humans. It contrasts with the concept of the “black box” in machine learning where even its designers cannot explain why an AI arrived at a specific decision.

Ethics in AI

Artificial Intelligence ethics, or AI ethics, comprise a set of values, principles, and techniques which employ widely accepted standards of right and wrong to guide moral conduct in the development and deployment of Artificial Intelligence technologies.

Data-Driven Business

The data-driven business puts data and analytics front and center in its business strategy and throughout all echelons. It differentiates itself from the competition by making data-driven optimization part of daily operations.

Creating a data culture is one of the keys to building a data-driven organization. The right technology, data literacy, and disrupting the status quo are ways to start.

Ankit Rathi is a Principal Data Scientist, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store