Sign in

Artificial Intelligence in Plain English
New AI, ML and Data Science articles every day.
Image for post
Image for post

Autonomous aircrafts exist today, and we’re not simply referring to drones. Over recent years, automated artificial intelligence (AI) flight systems have taken on more and more tasks from pilots. Today, these systems have taken on the role of an additional (not replacement) co-pilot. The benefit goals? Flight crews will have less stress and a reduced risk of pilot error.

In an industry that is projected to have a shortage of pilots, AI solutions may be the solution that will prevent your next flight cancellation. And why not? …

By Zachary Galante — Senior Data Science Student at Bryant University

Image for post
Image for post
Photo by Héctor J. Rivas on Unsplash

What is KNN?

KNN is a very basic Machine Learning algorithm that uses surrounding data to predict on new data. As shown in the image below by the question mark, it represents new data (or the test case) for the algorithm to classify. It then takes into account the classes and the distance of it’s neighbors make predictions for the testing data.

Image for post
Image for post

Clustering data using the k-means algorithm


  1. Unsupervised learning is a paradigm in machine learning where we build models without relying on labelled training data.
  2. Dealing with data that is labelled in some way means that learning algorithms can look at the data and learn to categorize them based on labels.
  3. In the world of unsupervised learning, we don’t have this opportunity! These algorithms are used when we want to find subgroups within datasets using a similarity metric.
  4. In unsupervised learning, information from the database is automatically extracted. All this takes place without prior knowledge of the content to be analyzed.
  5. In unsupervised learning, there is no…

A combination of different approaches leads to better results

A combination of different approaches leads to better results: this statement works in different aspects of our life and also adapts to algorithms based on machine learning.

Stacking is the process of combining various machine learning algorithms. This technique is due to David H. Wolpert, an American mathematician, physicist, and computer scientist.

We will learn how to implement a stacking method.

Getting ready

  1. We start by importing the libraries:
from heamy.dataset import Dataset
from heamy.estimator import Regressor
from heamy.pipeline import ModelsPipeline
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error


An interesting application of SVMs is to predict traffic, based on related data.

Getting ready

Download data file ‘traffic_data.txt’ . This is a dataset that counts the number of cars passing by during baseball games at the Los Angeles Dodgers home stadium. Each line in this file contains comma-separated strings formatted in the following manner:

  1. Day;
  2. Time;
  3. The opponent team;
  4. Whether or not a baseball game is going on;
  5. The number of cars passing by.

How to do it

Let’s see how to estimate the traffic.

  1. Load the data and relevant libraries:
SVM regressor to estimate traffic
import numpy as np
from sklearn import preprocessing
from sklearn.svm import SVR
input_file = 'traffic_data.txt' …

We will build an SVM to predict the number of people going in and out of a building.

Download the data file building_event_binary.txt, building_event_multiclass.txt from

Getting ready

Let’s understand the data format before we start building the model. Each line in building_event_binary.txt consists of six comma-separated strings. The ordering of these six strings is as follows:

  1. Day;
  2. Date;
  3. Time;
  4. The number of people going out of the building;
  5. The number of people coming into the building;
  6. The output indicating whether or not it’s an event.

The first five strings form the input data, and our task is to predict whether or not an event is going on in the building.

Each line in building_event_multiclass.txt consists of six comma-separated strings…

Image for post
Image for post

Hyperparameters are important for determining the performance of a classifier.

Getting ready

  1. In machine learning algorithms, various parameters are obtained during the learning process.
  2. In contrast, hyperparameters are set before the learning process begins.
  3. Given these hyperparameters, the training algorithm learns the parameters from the data.

We will extract hyperparameters for a model based on an SVM algorithm using the grid search method.

How to do it

Let’s see how to find optimal hyperparameters:

Datafile: download ‘data_multivar.txt’ form here:

  1. We start importing the libraries:
from sklearn import svm
from sklearn import model_selection
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
import pandas as pd
import utilities

2. Then, we load the data:

input_file = 'data_multivar.txt'…

The idea of creating a virtual human that can converse seamlessly with a user seems daunting to most people who are just getting into artificial intelligence and looking into how utterly complex existing commercial systems are. And their fears aren’t misled - larger systems that contain a plethora of data samples and an intricate network architecture, and are responsible for providing the highest quality home assistant system are very difficult to replicate. But, creating virtual assistants at a smaller level has already been simplified to allow virtually anyone to make their own conversational persona.

Over the past decade, the University…

It would be nice to know the confidence with which we classify unknown data. When a new data point is classified into a known category, we can train the SVM to compute the confidence level of that output as well. A confidence level refers to the probability that the value of a parameter falls within a specified range of values.

Getting ready

We will use an SVM classifier to find the best separating boundary between a
dataset of points. In addition, we will also perform a measure of the confidence level of the results obtained.


Download the file ‘data_multivar.txt’ from

How to do it


Tackling class imbalance

We dealt with problems where we had a similar number of data points in all our classes. In the real world, we might not be able to get data in such an orderly fashion. Sometimes, the number of data points in one class is a lot more than the number of data points in other classes. If this happens, then the classifier tends to get biased. The boundary won’t reflect the true nature of your data, just because there is a big difference in the number of data points between the two classes. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store