Stock Price Change Forecasting with Time Series: SARIMAX

High-level understanding of Time Series, stationarity, seasonality, forecasting, and modeling with SARIMAX

Time series modeling is the statistical study of sequential data (may be finite or infinite) dependent on time. Though we say time. But, time here may be a logical identifier. There may not be any physical time information in a time series data. In this article, we will discuss how to model a stock price change forecasting problem with time series and some of the concepts at a high level.

We will take Dow Jones Index Dataset from UCI Machine Learning Repository. It contains stock price information over two quarters. Let’s explore the dataset first:

Deploying ML Models in Production: Model Export & System Architecture

Understanding model export mechanisms, lightweight integration, offline & online model hosting techniques

We often see many techniques discussed here & there about solving problems with ML. But when it comes to putting all of them into production, we don’t see that much traction, and people still have to rely on some public cloud providers or open source for that. In this article, we will discuss ML models to be used in production, and the system architectures for supporting it. We will see how can we do that without having any public cloud provider.

Model export

Mostly all ML models are either mathematical expressions, equations or data structures (tree or graph). Mathematical expressions have coefficients, some variables, some constants, some parameters of probability distributions (distribution-specific parameters, standard deviations or mean). …

Image Classification using Deep Learning & PyTorch: A Case Study with Flower Image Data

Classifying Flower images using Convolutional Deep Neural Network with PyTorch library

Classifying image data is one of the very popular usages of Deep Learning techniques. In this article, we will discuss the identification of flower images using a deep convolutional neural network.

For this, we will be using PyTorch, TorchVision & PIL libraries of Python

Data Exploration

The required dataset for this problem can be found at Kaggle. It contains a folder structure & flower images inside it. There are 5 different types of flowers. The folder structure looks like below

Text Classification by XGBoost & Others: A Case Study Using BBC News Articles

Comparative study of different vector space models & text classification techniques like XGBoost and others

In this article, we will discuss different text classification techniques to solve the BBC new article categorization problem. We will also discuss different vector space models to represent text data.

We will be using Python, Sci-kit-learn, Gensim and the Xgboost library for solving this problem.

Getting the data

Data for this problem can be found from Kaggle. This dataset contains BBC news text and its category in a two-column CSV format. Let’s see what’s there

Multi-Label Text Classification Using Scikit-multilearn: a Case Study with StackOverflow Questions

Designing a multi-label text classification model which helps to tag stackoverflow.com questions with different topics

Everyday users of stackoverflow.com posts many technical questions and all those get tagged with different topics. In this article, we will discuss a classification model that can automatically tell which tags can be attached to an unanswered question.

Obviously, there are multiple tags that can be associated with a question. So, ultimately this problem becomes ‘classifying a question and attaching class labels to it’. By Machine Learning theory, it is a ‘Multi-Label classification’ problem.

We already discussed about different theoretical techniques and accuracy metrics required for multi-label models in the below article.

The above one is a pre-requisite for the current discussion. Readers are requested to go through that before this current article. …

Understanding Multi-Label classification model and accuracy metrics

The theory behind the multi-label/multi-tagging model, different umbrella classification schemes and accuracy metric analysis

Classification techniques probably are the most fundamental in Machine Learning. The majority of all online ML/AI courses and curriculums start with this.

In normal classification, we have a model defined, which classifies or tags a data instance with only one class label. Definitely, in the class set, there can(will) be multiple class labels, but the classifier will choose only one(best) among those.

Now, the question is: Can a data instance be classified/tagged with multiple possible class labels from the set? How the model should be designed and how can we calculate accuracy for that model? …

Prediction of Relative Locations of CT Slices in CT Images

Predicting the relative location of CT slices on the axial axis of the human body using regression techniques on very high-dimensional data

Regression is one of the most fundamental techniques in Machine Learning. In simple terms, it means, ‘predicting a continuous variable by other independent categorical/continuous variables’. Challenge comes, when we have high-dimensionality i.e. too many independent variables. In this article, we will discuss a technique of regression modeling with high-dimensional data using Principal Components and ElasticNet. We will also see how to save that model for future use.

We will use Python 3.x as the programming language and ‘sci-kit learn’, ‘seaborn’ as libraries for this article.

Data used here can be found at the UCI Machine Learning Repository. Dataset name is “Relative location of CT slices on axial axis Data Set”. This one contains extracted features of medical CT scan images for various patients (male & female). Features are numerical in nature. As per UCI, the goal is ‘predicting the relative location of a CT slice on the axial axis of the human body’. …

Multi-Class Text Classification Using PySpark, MLlib & Doc2Vec

How to use Apache Spark MLlib with PySpark for NLP problems and how to simulate Doc2Vec in Spark MLlib

Apache Spark nowadays is quite popular to scale up any data processing application. For Machine Learning also, it provides a library called ‘MLlib’ . It is a distributed programming approach to solve ML problems. In this article, we will see how to integrate this MLlib with PySpark and techniques of using Doc2Vec with PySpark for solving text classification problems.

Before going ahead, we need to know what is ‘Doc2Vec’. It is an NLP model to describe a text or document. It converts a text into a vector of numerical features to be used in any ML algorithm. Basically, it is a feature engineering technique. It tries to understand the context of documents by random sampling of words and trains a neural network with those. Hidden layer vectors of the neural network become document vectors a.k.a ‘Doc2Vec’. There is another technique called ‘Word2Vec’ which also works on similar principals. But instead of documents/texts, it works on word corpus and provides vectors for words. …

Unsupervised outlier detection in text corpus using Deep Learning

Auto-encoder based approach to finding the most unique movie plot in Wikipedia movie database.

Multi-Class classification with Sci-kit learn & XGBoost: A case study using Brainwave data

A comparison of different classifiers’ accuracy & performance for high-dimensional data

In Machine learning, classification problems with high-dimensional data are really challenging. Sometimes, very simple problems become extremely complex due this ‘curse of dimensionality’ problem.

In this article, we will see how accuracy and performance vary across different classifiers. We will also see how, when we don’t have the freedom to choose a classifier independently, we can do feature engineering to make a poor classifier perform well.

Understanding the ‘datasource’ & problem formulation

For this article, we will use the “EEG Brainwave Dataset” from Kaggle. This dataset contains electronic brainwave signals from an EEG headset and is in temporal format. …