Latest Release of DSX Local Is a Gift Basket for Data Scientists

Vikram Murali
IBM Data Science in Practice
3 min readSep 19, 2017

There’s an interesting pattern with technology. You see it time and again as new ideas and new capabilities come on the scene: While the futurists, journalists, and bloggers are busy touting the potential and debating the downsides, there’s always a group of actual users who simply sit down and get to work.

Data science is no different. The hype goes on, but in the meantime the work is underway. Across every industry, dedicated data scientists are hammering through data to gain the insights that will help their organizations thrive. Along the way, those data scientists have developed strong processes to get things done — workflows, infrastructure, and ways to ingest, store, and manage the data they use.

And as they try to improve their work and their efficiency, they’re not typically looking to reinvent the processes they’ve already put in place. Instead, what they need are tools to make each step easier and ways to fit the steps together. They need tools that integrate seamlessly with the warhorses of their daily work, whether GitHub, Kafka™, or Spark™. At the same time, they want to stay current on the tools they know will make them more productive and better collaborators — like Apache Zeppelin™, Jupyter, and RStudio. They’re also looking for ways to automate tasks and to be alerted when those tasks need attention.

For all those reasons, more and more data scientists are flocking to IBM’s Data Science Experience (DSX) Local. First released in April, the offering was just enhanced with its third update. The rate of updates illustrates just how fast the data science space is moving, and also shows IBM’s commitment to keeping its users on pace with it.

With this release, we’ve focused on expanding support for external tools and doubling down on ease-of-use.

The new Model Management dashboard

For example, we’ve added support for Zeppelin notebooks — and the ability to switch easily between Jupyter, RStudio, and Zeppelin at any time. The release also includes a new Model Management dashboard that lets DSX Local users schedule and monitor evaluations of models, and import machine learning models. Users can now also connect projects to GitHub repositories, export/import projects, pull changes, and commit project changes back to the repository.

More highlights with this release:

  • Manage all assets, including RStudio and data sets, from the Assets tab.
  • Reserve Apache Spark™ resources from the new Runtimes tab in your project.
  • Connect DSX Local projects to relational databases using Data sources and Remote data sets instead of Connections. (Data sources allow you to securely store information about your database and credentials.)
  • Use DSX Local notebooks to retrieve data from relational databases using APIs from third party modules.
  • Store objects in the file system instead of an object store.
  • Configure alert thresholds, dashboard refresh, and log and metric rotations.
  • Use REST APIs to manage files and folders.
  • Submit an Apache Kafka™ streaming application that connects to a Kafka broker over SSL.
  • And more. Check out the full list of features.

In every way, the release demonstrates a dedication to real-world data scientists and an urge to help them get things done. If you haven’t yet seen what it can do, check out this short video series.

--

--

Vikram Murali
IBM Data Science in Practice

Director of Engineering for Netezza/PureData System for Analytics, Integrated Analytics and ICP4D Systems within IBM Data and AI.