Object-oriented programming made simple

Object-oriented programming is a type of programming in which programmers define the data type of a data structure and also the type of operations that can be applied to the data structure.

Let’s first look at the basics of object-oriented programming. Class is a blueprint for the object. Object is simply a collection of attributes(variables) and methods (functions). Attributes are the characteristics of an object. Methods are the tasks performed on the object.

Let’s understand this with an example. ‘Shirts’ can be considered as a Class. ‘Shirt1’ and ‘Shirt2’ are different objects of the Class ‘Shirts’. ‘Color’, ‘Size’, and ‘Price’…


Yes! K-Means Clustering can be used for Image Classification of MNIST dataset. Here’s how.

Image by Gerd Altmann from Pixabay

K-means clustering is an unsupervised learning algorithm which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest centroid. The algorithm aims to minimize the squared Euclidean distances between the observation and the centroid of cluster to which it belongs.

K-Means clustering is not limited to the consumer information and population scientist. It can be used for Imagery analysis as well. Here we would use K-Means clustering to classify images of MNIST dataset.

Getting to know the data

The MNIST dataset is loaded from keras.

# Importing the dataset from kerasfrom keras.datasets import mnist(x_train…

Compression and Visualization of Iris data using Principle Component Analysis

Image by Manfred Richter from Pixabay

We would be working on the famous Iris flower data set.

The necessary packages are imported.

# Importing the necessary packagesimport numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

The Iris data set is imported from sklearn.datasets

# Importing the data setfrom sklearn import datasetsiris_data = datasets.load_iris()

Let’s get to know ‘iris_data’.

iris_data.keys()

Output is dict_keys([‘data’, ‘target’, ‘target_names’, ‘DESCR’, ‘feature_names’, ‘filename’]).

data’ :- As the name suggests, it’s the data.
target’ :- Integer encoding of output labels. 0 denotes ‘setosa’, 1 denotes ‘versicolor’ and 2 denotes ‘virginica’.
target_names’ :- The output labels…


A gentle introduction to sentiment analysis

Image by tookapic from Pixabay

Sentiment Analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.

This is a simple project of classifying the movie reviews as either positive or negative. We would be working on the ‘movie_reviews’ dataset in ntlk.corpus package.

The necessary packages are imported.

import nltk
from nltk.corpus import movie_reviews

Let us explore the package ‘movie_reviews’.

# A list of all the words in 'movie_reviews'movie_reviews.words()

The output is [‘plot’, ‘:’, ‘two’, ‘teen’, ‘couples’, ‘go’, ‘to’…


A gentle introduction to NLTK library of Python with simple examples

Image by StartupStockPhotos from Pixabay

The necessary packages are imported.

# Importing the necessary packagesimport nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk import PunktSentenceTokenizer
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords

Let us understand the function of each package.

from nltk.tokenize import word_tokenize, sent_tokenize

word_tokenize’ gives an output as a list of words in the input sentence.

# Example usage of ‘word_tokenize’a = ‘Spending today complaining about yesterday will not make tomorrow any better’word_tokenize(a)

The output is [‘Spending’, ‘today’, ‘complaining’, ‘about’, ‘yesterday’, ‘will’, ‘not’, ‘make’, ‘tomorrow’, ‘any’, ‘better’].

Given below is the snippet of code that…


Classification of genes & Performance comparison of common classifiers

Image by Arek Socha from Pixabay

Bioinformatics is a field of study that uses computation to extract knowledge from biological data. It includes the collection, storage, retrieval, manipulation and modeling of data for analysis, visualization or prediction. Here we would use machine learning to classify genes of E. Coli bacteria.

Let us understand the basics of Genetics. DNA or deoxyribonucleic acid is the hereditary material in humans and almost all other organisms. The DNA is made up of four chemical bases: Adenine(A), Guanine(G), Cytosine(C), and Thymine(T). Adenine pairs up with Thymine and Guanine pairs up with Cytosine. Each base is also attached to a sugar molecule…


Hyperparameter Tuning with GridSearchCV

This machine learning project is about Diabetes Prediction. We would be working on kaggle pima indians diabetes dataset.

The necessary packages are imported.

# Importing the necessary packagesimport pandas as pd
import numpy as np
import keras

The dataset is read into ‘df’ dataframe.

# Reading the filedf = pd.read_csv(‘/kaggle/input/pima-indians-diabetes-database/diabetes.csv’)

Let us understand the dataframe ‘df’.

df.shape # Shape of ‘df’

The size is (768,9) which suggests there are 768 cases and 9 columns.

df.columns # Prints columns of ‘df’

The columns are [‘Pregnancies’, ‘Glucose’, ‘BloodPressure’, ‘SkinThickness’, ‘Insulin’, ‘BMI’, ‘DiabetesPedigreeFunction’, ‘Age’, ‘Outcome’]

df.describe() # Displays properties of each…

This machine learning project is about clustering similar companies with K-means clustering algorithm. The similarity is based on daily stock movements.

The necessary packages are imported.

from pandas_datareader import data 
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import numpy as np
import plotly.graph_objects as go

A dictionary ‘companies_dict’ is defined where ‘key’ is company’s name and ‘value’ is company’s stock code. 28 companies are considered.

companies_dict = {
'Amazon':'AMZN',
'Apple':'AAPL',
'Walgreen':'WBA',
'Northrop Grumman':'NOC',
'Boeing':'BA',
'Lockheed Martin':'LMT'…

Anomaly detection with Local outlier factor and Isolation forest algorithms

This machine learning project is about detecting fraudulent credit card transactions. The data set ‘creditcard.csv’ can be downloaded from kaggle credit card fraud detection.

The necessary packages are imported.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

READING THE DATA

The data is loaded into ‘df’ dataframe.

df = pd.read_csv(‘creditcard.csv’)

VISUALIZING AND UNDERSTANDING THE DATA

Understanding the data is important as we get an intuitive feeling for the data which helps to identify the necessary preprocessing steps.

df.shape # Prints the shape of ‘df’

The ‘df’ data…


Photo by Christopher Paul High on Unsplash

This machine learning project is about predicting the rating of board games. The data set ‘games.csv’ can be downloaded here.

The necessary packages are imported.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn import model_selection
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

READING THE DATA

The data is loaded into ‘df’ dataframe.

df = pd.read_csv(‘games.csv’)

UNDERSTANDING THE DATA

Understanding the data is important as we get an intuitive feeling for the data, check for missing values, check for incorrect data, check for incorrect relationships between…

S Joel Franklin

Data Scientist | Fitness enthusiast | Avid traveller | Happy Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store