Machine Learning, Statistics

High-level understanding of Time Series, stationarity, seasonality, forecasting, and modeling with SARIMAX

Image for post
Image for post

Time series modeling is the statistical study of sequential data (may be finite or infinite) dependent on time. Though we say time. But, time here may be a logical identifier. There may not be any physical time information in a time series data. In this article, we will discuss how to model a stock price change forecasting problem with time series and some of the concepts at a high level.

We will take Dow Jones Index Dataset from UCI Machine Learning Repository. It contains stock price information over two quarters. Let’s explore the dataset first:

Machine Learning

Understanding model export mechanisms, lightweight integration, offline & online model hosting techniques

Image for post
Image for post
Photo by Markus Spiske on Unsplash

We often see many techniques discussed here & there about solving problems with ML. But when it comes to putting all of them into production, we don’t see that much traction, and people still have to rely on some public cloud providers or open source for that. In this article, we will discuss ML models to be used in production, and the system architectures for supporting it. We will see how can we do that without having any public cloud provider.

Model export

Mostly all ML models are either mathematical expressions, equations or data structures (tree or graph). Mathematical expressions have coefficients, some variables, some constants, some parameters of probability distributions (distribution-specific parameters, standard deviations or mean). …

Deep Learning, Machine Learning, Python

Classifying Flower images using Convolutional Deep Neural Network with PyTorch library

Image for post
Image for post
Photo by Krystina rogers on Unsplash

Classifying image data is one of the very popular usages of Deep Learning techniques. In this article, we will discuss the identification of flower images using a deep convolutional neural network.

For this, we will be using PyTorch, TorchVision & PIL libraries of Python

Data Exploration

The required dataset for this problem can be found at Kaggle. It contains a folder structure & flower images inside it. There are 5 different types of flowers. The folder structure looks like below

Machine Learning

Comparative study of different vector space models & text classification techniques like XGBoost and others

Image for post
Image for post

In this article, we will discuss different text classification techniques to solve the BBC new article categorization problem. We will also discuss different vector space models to represent text data.

We will be using Python, Sci-kit-learn, Gensim and the Xgboost library for solving this problem.

Getting the data

Data for this problem can be found from Kaggle. This dataset contains BBC news text and its category in a two-column CSV format. Let’s see what’s there

Machine Learning

Designing a multi-label text classification model which helps to tag questions with different topics

Image for post
Image for post

Everyday users of posts many technical questions and all those get tagged with different topics. In this article, we will discuss a classification model that can automatically tell which tags can be attached to an unanswered question.

Obviously, there are multiple tags that can be associated with a question. So, ultimately this problem becomes ‘classifying a question and attaching class labels to it’. By Machine Learning theory, it is a ‘Multi-Label classification’ problem.

We already discussed about different theoretical techniques and accuracy metrics required for multi-label models in the below article.

The above one is a pre-requisite for the current discussion. Readers are requested to go through that before this current article. …

Machine Learning

The theory behind the multi-label/multi-tagging model, different umbrella classification schemes and accuracy metric analysis

Image for post
Image for post

Classification techniques probably are the most fundamental in Machine Learning. The majority of all online ML/AI courses and curriculums start with this.

In normal classification, we have a model defined, which classifies or tags a data instance with only one class label. Definitely, in the class set, there can(will) be multiple class labels, but the classifier will choose only one(best) among those.

Now, the question is: Can a data instance be classified/tagged with multiple possible class labels from the set? How the model should be designed and how can we calculate accuracy for that model? …

Regression using Principal Components & ElasticNet | Towards AI

Predicting the relative location of CT slices on the axial axis of the human body using regression techniques on very high-dimensional data

Image for post
Image for post

Regression is one of the most fundamental techniques in Machine Learning. In simple terms, it means, ‘predicting a continuous variable by other independent categorical/continuous variables’. Challenge comes, when we have high-dimensionality i.e. too many independent variables. In this article, we will discuss a technique of regression modeling with high-dimensional data using Principal Components and ElasticNet. We will also see how to save that model for future use.

We will use Python 3.x as the programming language and ‘sci-kit learn’, ‘seaborn’ as libraries for this article.

Data used here can be found at the UCI Machine Learning Repository. Dataset name is “Relative location of CT slices on axial axis Data Set”. This one contains extracted features of medical CT scan images for various patients (male & female). Features are numerical in nature. As per UCI, the goal is ‘predicting the relative location of a CT slice on the axial axis of the human body’. …

Machine Learning, Programming, Python

How to use Apache Spark MLlib with PySpark for NLP problems and how to simulate Doc2Vec in Spark MLlib

Image for post
Image for post
Image Source: News — Pixabay

Apache Spark nowadays is quite popular to scale up any data processing application. For Machine Learning also, it provides a library called ‘MLlib’ . It is a distributed programming approach to solve ML problems. In this article, we will see how to integrate this MLlib with PySpark and techniques of using Doc2Vec with PySpark for solving text classification problems.

Before going ahead, we need to know what is ‘Doc2Vec’. It is an NLP model to describe a text or document. It converts a text into a vector of numerical features to be used in any ML algorithm. Basically, it is a feature engineering technique. It tries to understand the context of documents by random sampling of words and trains a neural network with those. Hidden layer vectors of the neural network become document vectors a.k.a ‘Doc2Vec’. There is another technique called ‘Word2Vec’ which also works on similar principals. But instead of documents/texts, it works on word corpus and provides vectors for words. …

A comparison of different classifiers’ accuracy & performance for high-dimensional data

Image for post
Image for post
Photo Credit : Pixabay

In Machine learning, classification problems with high-dimensional data are really challenging. Sometimes, very simple problems become extremely complex due this ‘curse of dimensionality’ problem.

In this article, we will see how accuracy and performance vary across different classifiers. We will also see how, when we don’t have the freedom to choose a classifier independently, we can do feature engineering to make a poor classifier perform well.

Understanding the ‘datasource’ & problem formulation

For this article, we will use the “EEG Brainwave Dataset” from Kaggle. This dataset contains electronic brainwave signals from an EEG headset and is in temporal format. …


Avishek Nag

Machine Learning practitioner & Author with work experience on Python, Spark-ML, Java & Big data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store