ML tools (Java)

jayan chathuranga
Techco
Published in
3 min readNov 2, 2016
  1. Weka has a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
  2. Rapidminer was developed at Technical University of Dortmund, Germany. It provides a GUI and a Java API for developing your own applications. It provides data handling, visualization and modeling with machine learning algorithms.
  3. Environment for Developing KDD-Applications Supported by Index-Structure (ELKI) is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection.
  4. Massive Online Analysis (MOA) is a popular open source framework for data stream mining, with a very active growing community. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems.
  5. Apache SAMOA is a machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms and enables development of new ML algorithms without directly dealing with the complexity of underlying distributed stream processing engines (Apache Storm, Apache S4, and Apache Samza). Its users can develop distributed streaming ML algorithms once and execute them on multiple DSPEs.
  6. JSAT is a library for quickly getting started with Machine Learning problems. It is developed in my free time, and made available for use under the GPL 3. Part of the library is for self education, as such — all code is self contained. JSAT has no external dependencies, and is pure Java.
  7. Java-ML is a Java API with a collection of machine learning algorithms implemented in Java. It only provides a standard interface for algorithms.
  8. MLlib (Spark) is Apache Spark’s scalable machine learning library. Although Java, the library and the platform support Java, Scala and Python bindings. The library is new and the list of algorithms is long.
  9. H2O is a machine learning API for smarter applications. It scales statistics, machine learning, and math over big data. H2O is extensible and individual can build blocks using simple math legos in the core.
  10. RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been implemented.

Apache SAMOA

Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms. It enables development of new ML algorithms without directly dealing with the complexity of underlying distributed stream processing engines (DSPEe, such as Apache Storm, Apache S4, and Apache Samza). Apache SAMOA users can develop distributed streaming ML algorithms once and execute them on multiple DSPEs. SAMOA includes distributed machine learning for data streams with an interface to plug-in different stream processing platforms.

SAMOA can be used in two different scenarios; data mining and machine learning on data streams, or developers can implement their own algorithms and run them on production. Another aspect of SAMOA is the stream processing platform abstraction where developers can also add new platforms by using the API available. With these separation of roles the SAMOA project is divided into SAMOA-API and SAMOA-Platform. The SAMOA-API allows developers to develop for SAMOA without worrying about which distributed SPE is going to be used. In the case of new SPEs being released or the interest in integrating another platform, a new SAMOA-Platform module can be added. The first release of SAMOA supports two SPE that are the state or the art on the subject matter; Apache S4 and Twitter Storm.

SAMOA’s main goal is to help developers to create easily machine learning algorithms on top of any distributed stream processing engine.

algorithms implemented in SAMOA

  • Classification -Vertical Hoeffding Tree Classifier,Naive Bayes classifier
  • Clustering
  • Regression
  • Meta-algorithms — Boosting,Bagging

--

--