SMILE: A Machine Learning Library for Scala Lovers.

Knoldus Inc.
Knoldus - Technical Insights
3 min readJun 12, 2017

For past few days, I have been getting an interest in Machine Learning. We all have heard that machine learning is the most hyped concept right now. So, what is Machine Learning? How could we describe this concept? Well, Machine Learning has a lot of definitions and usually there’s no precise definition for ML. According to Arthur Samuel, Machine Learning is a field of study that gives computer the ability to learn without being explicitly programmed. Some people say machine learning results in generating static model based on historical data, which then provides the way to predict for future data. On the other hand, some consider that it results in dynamic model that changes along with the addition of more data over time. But I would say it provides both.

While going through different aspects of machine learning and understanding the terminologies like:

Supervised Learning

Unsupervised Learning

Features

Classification

Regression

And much more,

I tried to take a sneak peek at its implementation. And while peeking through different libraries like Hadoop, spark and Flink etc. that are used when Big Data is involved, I also got to know about SMILE that’s being used in situations when Big Data is not involved.

SMILE: Statistical Machine Intelligence and Learning Engine

This library is developed in java and offers an API for Scala too. This Library has a variety of algorithms for Classification, Regression, Clustering, Feature Selection and Association Rule Mining. Since I am keen to implement and observe the different possibilities of Machine Learning, I would be going through SMILE as it would be providing a good start to my first experience with Machine Learning. Here’s an example:

[code language=”scala”]
val data = read.arff(“data/weka/iris.arff”, 4)
val (x, y) = data.unzipInt
val rf = randomForest(x, y)
println(s”OOB error = ${rf.error}”)
rf.predict(x(0))
[/code]

In this example, we use the famous Iris data from R.A. Fisher. The data is in Weka’s ARFF format. The second parameter of read.arff is the column index of response variable. With our parsers, the column index starts with 0. The function read.arff returns an object of AttributeDataset. Besides the data itself, an AttributeDataset object also contains many meta data. Then we use the help function unzipInt to get the training data and labels. Finally, we train a random forest with default parameters and print out its OOB (out of bag) error. We can apply the model on new data samples with the method predict. To understand more about this example, you can go through the quick start provided in the reference link that I am sharing.

Before signing off for now I would like to do a quick announcement that, I will not be going through it alone, more blogs are on its way which would contain more information about SMILE library and its use in implementing the code. So, keep going through our blogs, because here, one doesn’t learn alone, we learn it together. #mlforscalalovers

References: https://haifengl.github.io/smile/

KNOLDUS-advt-sticker

--

--

Knoldus Inc.
Knoldus - Technical Insights

Group of smart Engineers with a Product mindset who partner with your business to drive competitive advantage | www.knoldus.com