Training and testing the model(Part 4)

In the previous lesson, we created the pipeline. It is noteworthy that our ML pipeline contains vital workflow components comprising of the transformers and logistic regression estimator. We would need to fit our processed training dataset unto this pipeline. So, we have:

val model = pipeline.fit(trainProcessed.mainDataProcessor)

If you recall the testProcessed discussed in Part 2; well, now to make our prediction, we need to apply this to our model's transform function as follows:

val predictions = model.transform(testProcessed.mainDataProcessor)

Voila! we have successfully trained and test our model. And, the magic we have been waiting for is to show the category for the test data. This we do by using the following:

predictions.select("prediction", "predictedLabel").show(20)

Notice that we have the numeric prediction and the categorical predictedLabel associated with it. Essentially, it is the predictedLabel that is of main interest and we are able to obtain this on the basis of our IndexToString().

Also, we purposely limit our display to 20 samples. However, to display the whole classification going into over 1000 samples, it might be better to write it into a filesystem — let me know how you go about doing this.

In our final lesson, we will discuss the evaluation of the model. Thanks for reading and see you there!

Next: Model evaluation

--

--

Taiwo Adetiloye
Analyzing the Amazon Product Data Set using SparkMLlib LogisticRegression Classification Model

Taiwo O. Adetiloye is very interested in large scale data processing and analytics using AI and ML frameworks like Spark, Keras, Tensorflow.