Welcome to the Data Science Experience

Armand Ruiz
4 min readJun 9, 2016

--

Introduction

Data science is emerging as the next frontier of analytics. While it has existed in some form for many years, we are at the early stages of a mass awareness and adoption, with many enterprises just now starting to experiment and very few actually in production.

Traditionally data scientists are trained to use commercial tools and have a strong background in social sciences, economics, and mathematics. The new generation is self-trained; they use mainly open source technologies and are not scared of programming and using APIs. However, because existing tools require different levels of expertise, collaboration across tools is difficult.

Today we are excited to announce the IBM Data Science Experience, an environment that has everything a data scientist needs to be successful. IBM Data Science Experience is an interactive, collaborative, cloud-based environment where data scientists can use multiple tools to activate their insights. Data scientists can use the best of open source, tap into IBM’s unique features, grow their capabilities, and share their successes.

Learn: View sample notebooks and watch tutorials while you code

Use the built-in learning to get started or go the distance. Join a vibrant community of data scientists across industries, functions, and organization types. Take advantage of shared data sets, notebooks, and tutorials. Share your work with your team and your peers. Start a course, start from a sample, or start from scratch.

Create: Build your data science projects with your favorite tools

Use Python, R, or Scala in Jupyter Notebooks already connected to Spark. Notebooks are a popular environment to create and share documents that contain live code, equations, visualizations, and explanatory text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, machine learning, and much more.

We teamed with RStudio to deliver the most widely used open-source R statistical computing environment. The IBM Data Science Experience offers the RStudio flagship product, a popular integrated development environment (IDE) that makes it easy for anyone to analyze data with R.

Some of the pain in using data science tools lies in installing, setting up, and maintaining them — we’ve done all that for you.

On top of the open source capabilities, we are adding new features and APIs. These are some that we are providing today (and many more are coming!):

  • Sparkling.Data: Cleaning and preparing data for analysis are the tasks that data scientists typically spend the majority of their time on. We created a library that helps you discover the different file types and returns a data frame loaded with data (by default) from the file type that occurs the most. You can use it to infer the schema, discover data types, profile data sets, view range and distribution, reveal and fix bad data, and much more.
  • Prescriptive Analytics: The Decision Optimization CPLEX Modeling library (DOcplex) contains modeling packages such as Mathematical Programming and Constraint Programming.
  • Shiny: Data scientists typically create visualizations to share their analysis with others. We include Shiny in the IBM Data Science Experience to allow you to create interactive analytic web applications without coding any HTML, CSS, or JavaScript — only R. Check here to see a gallery of useful examples to learn more.
  • Data Connections: From the Notebook interface, you can set up data connections to Bluemix data services like Cloudant or dashDB or to on-premises or external services.
  • Schedule Jobs: From the Notebook interface, you can schedule jobs to run periodically.

Collaborate: Leverage the work of your peers to accelerate your own

Collaborate with your peers on projects to find better solutions together. Share your knowledge and your code — and help fuel the advancement of data science for all.

Data scientists can also share Jupyter notebooks with each other easily in their workspace. We added the ability to export notebooks to HTML so that you can publish them to a public setting. Some examples of shared notebooks include:

Powered by Apache Spark

IBM Data Science Experience is built for enterprise-scale deployment. Manage your data, your analytical assets, and your projects in a secured cloud environment.

When you create an account in the IBM Data Science Experience, we deploy for you a Spark as a Service instance to power your analysis and 5 GB of IBM Object Storage to store your data.

Roadmap

The service is now available in limited preview, and we are working hard to open it to all users. We have a vision for the IBM Data Science Experience, but the real owner of the roadmap will be the community. We’ve been learning from data scientists the last years, and we’d love to hear about your experience using the service.

We can’t wait to see what you build with the IBM Data Science Experience.

Learn more at datascience.ibm.com

--

--

Armand Ruiz

Lead Product Manager IBM Watson Studio and Machine Learning