Want to publish your story on The Data Science Publication?

Just leave a comment on this article expressing your interest and we will review your previous articles. If you meet…

By Mr. Data Science

Throughout this article, we will describe how you can use decision trees and random forest classifiers to predict the cause of wildfires. First, we will use SQLite to import the data into a Pandas Dataframe. Next, we will do some preprocessing and data exploration to better understand the dataset. Finally, we will apply a random forest classifier to the complete dataset, as well as a subset (California wildfires). The concepts described in this article are applicable to a wide range of problems. If you have any feedback, we look forward to hearing from you.

Fundamentally, a…

By Mr. Data Science

Throughout this article, I will:

- Describe the Kernel Density Estimate in a practical non-mathematical way
- Demonstrate how to use KDE and data visualization for exploratory data analysis

I’ll assume:

- You have Python 3 installed and can install necessary libraries
- You have some experience using Python and maybe using Python data libraries such as Pandas.

Often, your data will not conform to a common distribution function — for example it might have a multimodal distribution (the data distribution has multiple peaks). Instead of trying to fit the data to a common distribution an algorithm is used to…

By Mr. Data Science

Throughout this article, we will analyze some data on UFO sightings. Recent press releases from the Pentagon have sparked new interest in the topic of UFOs/UAPs, so it is a trendy and interesting way to introduce some data science and data analytics concepts. However, we need to be realistic about what we can discover from publically available datasets on this topic. These datasets usually consist of eyewitness accounts; therefore, the data should be considered low quality from a scientific perspective. Science does not put as much faith in eyewitness accounts as the legal system does; reference…

By Mr. Data Science

Data Science has had a huge impact on the field of medical science. Some of the areas where it is making a difference include:

- Medical image analysis
- Genetics and Genomics research
- Creating new drugs/Drug Discovery with Data Science
- Predictive Analytics in Healthcare
- Data Analysis of healthcare data

Topics like the discovery of new drugs are a little beyond the scope of this article but we can still take a look at some examples of predictive and exploratory data analysis of healthcare data. In example 1 we’ll look at data on cancer and how we could approach…

By Mr. Data Science

An excellent way to learn data science is to do data science: get some data and start analyzing it. The techniques used in this article can be applied to any data, and some of the issues we will encounter are typical of the challenges real-world data analysis throws up.

This article will investigate some data on asteroids to find if there is a threat of collision. Example 3 will use machine learning to classify asteroids as potential threats. …

By Mr. Data Science

Throughout this article, we will explore migration data to gain a better understanding of migration drivers. Since migration remains a contentious political issue, we will refrain from giving opinions and focus on the data instead. To investigate migration drivers we will use a couple of datasets (all of them csv files):

- The country data set was downloaded from Kaggle
- The happiness reports (5 files) were also downloaded from Kaggle

The goals for this article are to:

- demonstrate some useful data science techniques such as combining datasets, generating correlation heat maps, and applying k-means to a dataset
- …

By Mr. Data Science

Throughout this article, I will:

- Show you how to import individual and multi stock datasets using the yfinance library
- Describe several techniques you can use to visualize stock data

I’ll assume:

- Python is installed on your machine and you can install necessary libraries on your own
- You are familiar with the python programming language and its syntax

Before we get started, you will need to install the following Python libraries.

- numpy — A python library for scientific computing
- pandas — A python library for data manipulation and analysis.
- matplotlib — A python library for creating static…

By Mr. Data Science

Throughout this article, I will:

- Describe the Naive Bayes Classifier
- Demonstrate how to Vectorize text input data and train a Naive Bayes classifier
- Create a model that predicts if movie reviews are positive or negative
- Show how you can obtain different machine learning metrics

I’ll assume:

- You have some basic knowledge of Machine Learning especially regarding the idea of classification
- You have python 3 installed and can install the libraries below if required

Bayes theorem provides a way of calculating a conditional probability — it allows us to calculate the probability of event A given the…

By Mr. Data Science

Throughout this article, I will:

- Show you how to import the Fashion MNIST dataset using tensorflow
- Demonstrate how to plot the Fashion MNIST images

I’ll assume you:

- have python installed on your machine and can install necessary libraries on your own
- are familiar with the python programming language and its syntax

Before getting started, I thought you might want a brief background on the Fashion MNIST dataset. The Fashion MNIST dataset consists of 70,000 (60,000 sample training set and 10,000 sample test set) 28x28 grayscale images belonging to one of 10 different clothing article classes. The…

Data, Simplified