MachineX: An Introduction to KSAI, a machine learning library

Sugandha Arora
Knoldus - Technical Insights
4 min readSep 3, 2018

Take a closer look at Linkedin or any media platform for a couple of minutes, you’ll find that the hot topic in the technology section nowadays is Machine Learning and Artificial Intelligence. Why Machine learning and artificial intelligence? Well needless to say it is transforming the world like anything. People are doing good in business by predicting different aspects, doctors are doing good in medical treatments, farmers are doing good in farming and everyone is doing good in day to day tasks as well, lots of it is happening because of the availability of machine intelligence. That machine intelligence is happening through the data around us. It is being always believed that we will be having this kind of intelligence in our machines and with the rise of AI and machine learning we can see it happening. In the first interaction, it felt like people were only doing the analytics for business with it but it is quite clear now that it is way much more than just that, as we can see the evolution of Digital assistance and autonomous cars.

If you just want to read only about the KSAI, skip to the introduction part, otherwise, you can continue with our story. There are many reasons for why we have started KSAI but the most important one was that we didn’t want to learn Python although as you have already know Python is a very good friend of every Data Scientist. To learn machine learning first learn Python was a no-no for us, we believe it is important to know the algorithms better than what it is built on. So we decided to learn the theory and the maths of all possible machine learning algorithms and then implement them in our favorite language i.e. Scala. Yes, of course, Spark MLLib is there but didn’t want to digging in spark either. The irony is, now we know the algorithms and their theory and maths and we are also ok to learn python or Spark MLLib. However we still probably focus on KSAI to take it to the next level. You might find the story for creating the machine learning story a bit silly but if you think carefully nobody is losing anything either everyone is gaining.

Introduction

KSAI is an open source library which is built on Scala. It contains a lot of machine learning algorithms and many more are on its way. Go through the link to use it or contribute to it. The best way to explore the library would be to go through its test cases till the time the documentation for the same gets ready. The main algorithms could be found in the package core and within the core package, it got further packaged according to the different type of algorithms. If we take Neural Network as an example for doing some classification training, we can find its implementation here i.e. inside the package core.classification. Similarly if one wants to use the Neural Network for regression it can be found in core.regression. Its test cases for classification and regression can be taken as the examples of how to use it.

As mentioned earlier the library is built on Scala and it has used Scala’s usp features. It will spill a case class as an ml model after getting trained. In many places, it has used Future and Akka actors to do the computing as well. Basically, whenever possible it is being tried to use asynchronous or parallel programming. Along with the scala’s functionalities, the library is also dependent on another library called breeze. Breeze is a numerical processing library for Scala. For example, Neural Network uses matrices for its implementation, so all the operations that need to be done for it are being used with the breeze library.

Below are some of the algorithms that we have done –

  • Neural Network
  • Association Rule
  • Decision Tree
  • Random Forest
  • KNN
  • K-Means
  • Naive Bayes
  • Logistic Regression
  • PCA
  • LDA
  • Single Noise Ratio
  • Sum Square Ratio

Experience

Every committer would admit that we didn’t only learn the machine learning algorithms while building it but also write scala in such a way which would actually be performed better. The worst experience was when Scala’s inbuilt functions started performing badly. We not only had to go into the source code of Scala to understand what might have been wrong but also faced some weird situations for which we still do not have an answer. Nobody probably know better than us now why not to use Scala List while doing such a low-level programming or why one must never use zipWithIndices sort of functions in any level of programming. Scala’s fold, foreach uses while internally but somehow when we used while directly it was much faster. Of course, there are reasons for that but let’s not have a debate on that as these things get compensated on other cool features of Scala.

You can find the library here https://github.com/KnoldusLabs/KSAI. Below are the committees of the project for now.

Originally published at blog.knoldus.com on September 3, 2018.

--

--