Apache Spark 2.0 support!

Today we are announcing support for Apache® Spark™ 2.0 in the IBM Data Science Experience. This new version comes with plenty of new features, such as support for Machine Learning algorithms in MLlib in Python and R, huge performance improvements, and simplified and unified APIs.

Apache® Spark™ is an open-source cluster computing framework with in-memory processing to speed analytic applications up to 100 times faster compared to technologies on the market today. It is the core of the platform that power the IBM Data Science Experience.

Apache Spark is known for its ease of use in creating algorithms that harness insight from complex data. Spark was elevated to a top-level Apache Project in 2014 and continues to expand today.

IBM is all-in on its commitment to Apache Spark with investments in design-led innovation and board-scale education programs to promote open source innovation and accelerate intelligence into every application.

In a relatively short period of time, the IBM Spark Technology Center (STC) has made notable contributions to the greater Apache Spark eco-system, and is very energetic and passionate in moving Spark forward. As per our latest count, there were 2,590 JIRAs (new features and bug fixes), from 309 contributors worldwide in Apache Spark™ 2.0.

The Spark Technology Center focuses efforts on expanding Spark’s core technology to make it enterprise and cloud ready — with the aim of accelerating the business value of Spark and driving intelligence into business applications. With our growing pool of contributors (50 team members worldwide — including two committers), we’ve crunched out over 422 commits to Spark 2.0 in the areas of Spark Core, Spark R, SQL, MLlib, Streaming, PySpark, and more.

All this amounts to over 18,600 lines of new code in the 2.0 release. Our largest contribution is in the area of Spark SQL with over 10,200 lines of new code, followed by Machine Learning (Spark ML and PySpark) with over 6,900 lines of new code. We are the company with more contributions to Apache Spark MLlib.