Sign in

The Data Science Publication
Data, Simplified

Machine Learning

In The Data Science Publication. More on Medium.

By Mr. Data Science

Photo by Carlos Muza on Unsplash

Throughout this article, we will look at techniques like cleaning and reshaping data, plotting violin plots, and analyzing text data in the context of job markets. Specifically, we’ll take a look at some differences between the three main “data” intensive jobs: data scientist, data engineer, and data analyst.

Brief Background On Data Jobs:

In 2012 an article [1] was published in the Harvard Business Review: Data Scientist: The Sexiest Job of the 21st Century. …

By Mr. Data Science

Photo by AJ Colores on Unsplash

In law enforcement, different types of policing exist. There is active policing such as crowd control and traffic control, and there is preventative policing, where the police make themselves visible to deter crime. Finally, there is reactive policing where a crime occures, and the police respond, investigate, and aprehend criminals. In this article, I’ll demonstrate how Data Science can use pattern detection to predict where and when crimes might happen. This capability could enable reactive policing to become more proactive and preventative. Police forces have been plotting crimes on maps to look for crime patterns for…

By Mr. Data Science

Let’s say we are data scientists working for a retail company and our boss wants to create a targeted marketing campaign. In order to focus the campaign, we have to divide the set of customers into smaller subsets based on the features in our customer dataset. Features are just the columns in the dataset and each row represents a unique customer. So as the data science team, our job is to somehow find those groups.

This task is different from many other machine learning tasks in that we don’t have any labelled data so we can’t…

By Mr. Data Science

Photo by Hush Naidoo on Unsplash

Data Science has had a huge impact on the field of medical science. Some of the areas where it is making a difference include:

  • Medical image analysis
  • Genetics and Genomics research
  • Creating new drugs/Drug Discovery with Data Science
  • Predictive Analytics in Healthcare
  • Data Analysis of healthcare data

Topics like the discovery of new drugs are a little beyond the scope of this article but we can still take a look at some examples of predictive and exploratory data analysis of healthcare data. In example 1 we’ll look at data on cancer and how we could approach…

By Mr. Data Science

Photo by Greg Rakozy on Unsplash

An excellent way to learn data science is to do data science: get some data and start analyzing it. The techniques used in this article can be applied to any data, and some of the issues we will encounter are typical of the challenges real-world data analysis throws up.

This article will investigate some data on asteroids to find if there is a threat of collision. Example 3 will use machine learning to classify asteroids as potential threats. …

By Mr. Data Science

Photo by Atul Pandey on Unsplash

A Brief Overview:

Throughout this article, we will explore migration data to gain a better understanding of migration drivers. Since migration remains a contentious political issue, we will refrain from giving opinions and focus on the data instead. To investigate migration drivers we will use a couple of datasets (all of them csv files):

  • The country data set was downloaded from Kaggle
  • The happiness reports (5 files) were also downloaded from Kaggle

The goals for this article are to:

  1. demonstrate some useful data science techniques such as combining datasets, generating correlation heat maps, and applying k-means to a dataset

By Mr. Data Science

Throughout this article, we will describe how you can use decision trees and random forest classifiers to predict the cause of wildfires. First, we will use SQLite to import the data into a Pandas Dataframe. Next, we will do some preprocessing and data exploration to better understand the dataset. Finally, we will apply a random forest classifier to the complete dataset, as well as a subset (California wildfires). The concepts described in this article are applicable to a wide range of problems. If you have any feedback, we look forward to hearing from you.

Background on Random Forests:

Fundamentally, a…

By Mr. Data Science

Photo by Jr Korpa on Unsplash

A Brief Overview and Some Assumptions:

Throughout this article, I will:

  • Describe the Naive Bayes Classifier
  • Demonstrate how to Vectorize text input data and train a Naive Bayes classifier
  • Create a model that predicts if movie reviews are positive or negative
  • Show how you can obtain different machine learning metrics

I’ll assume:

  • You have some basic knowledge of Machine Learning especially regarding the idea of classification
  • You have python 3 installed and can install the libraries below if required

Background on Bayes Theorem applied to classification problems:

Bayes theorem provides a way of calculating a conditional probability — it allows us to calculate the probability of event A given the…

By Mr. Data Science

Throughout this article, I will:

  • Show you how to import individual and multi stock datasets using the yfinance library
  • Describe several techniques you can use to visualize stock data

I’ll assume:

  • Python is installed on your machine and you can install necessary libraries on your own
  • You are familiar with the python programming language and its syntax

Setting Up Your Environment:

Before we get started, you will need to install the following Python libraries.

  • numpy — A python library for scientific computing
  • pandas — A python library for data manipulation and analysis.
  • matplotlib — A python library for creating static…

By Mr. Data Science


Throughout this article, I will:

  • Show you how to import the Fashion MNIST dataset using tensorflow
  • Demonstrate how to plot the Fashion MNIST images


I’ll assume you:

  • have python installed on your machine and can install necessary libraries on your own
  • are familiar with the python programming language and its syntax


Before getting started, I thought you might want a brief background on the Fashion MNIST dataset. The Fashion MNIST dataset consists of 70,000 (60,000 sample training set and 10,000 sample test set) 28x28 grayscale images belonging to one of 10 different clothing article classes. The…

The Data Science Publication

Data, Simplified

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store