In the previous lesson, we performed the basic steps of exploratory data analysis. This and the current lesson constitute important preprocessing steps before a model can be trained.
In this tutorial, we set out to analyze the Amazon product dataset using SparkMLlib. The training data set includes ASIN, Brand Name, Category Name, Product Title, Image URL. For a detailed description of the Amazon product data, the reader can refer to Julian McAuley webpage.