Archive of stories published by Analyzing the Amazon Product Data Set using SparkMLlib LogisticRegression Classification Model

All

Taiwo Adetiloye in Analyzing the Amazon Product Data Set using SparkMLlib LogisticRegression Classification Model

Dec 27, 2017

A tutorial(Part 1)

In this tutorial, we set out to analyze the Amazon product dataset using SparkMLlib. The training data set includes ASIN, Brand Name, Category Name, Product Title, Image URL. For a detailed description of the Amazon product data, the reader can refer to Julian McAuley webpage.

Taiwo Adetiloye in Analyzing the Amazon Product Data Set using SparkMLlib LogisticRegression Classification Model

Dec 27, 2017

Model transformation(Part 2)

In the previous lesson, we performed the basic steps of exploratory data analysis. This and the current lesson constitute important preprocessing steps before a model can be trained.

Taiwo Adetiloye in Analyzing the Amazon Product Data Set using SparkMLlib LogisticRegression Classification Model

Dec 27, 2017

Creating the model pipeline(Part 3)

In the previous lesson, in our Transformation TODO list we implemented the StringIndexer() and SQLTransformer() transformers. In this lesson, we would implement the VectorAssembler() and Normalizer() and proceed to build our pipeline. Basically, a machine…

Taiwo Adetiloye in Analyzing the Amazon Product Data Set using SparkMLlib LogisticRegression Classification Model

Dec 27, 2017

Training and testing the model(Part 4)

In the previous lesson, we created the pipeline. It is noteworthy that our ML pipeline contains vital workflow components comprising of the transformers and logistic regression estimator. We would need to fit our processed training dataset unto this pipeline…

Taiwo Adetiloye in Analyzing the Amazon Product Data Set using SparkMLlib LogisticRegression Classification Model

Dec 27, 2017

Model Evaluation(Part 5)

In the previous lesson, we presented the training and testing of our model. This was on the basis of creating our ML pipeline. The following figure illustrates the workflow.

About

Analyzing the Amazon Product Data Set using SparkMLlib LogisticRegression Classification Model

In this tutorial, we set out to analyze the Amazon product dataset using SparkMLlib. The training data set includes ASIN, Brand Name, Category Name, Product Title, Image URL. Our objective is to use Scala programming language to write a classifier utilizing key product features.

More information