Upload data and create Data Frames in Jupyter Notebooks

When you create an account in IBM Data Science Experience we provision a free Apache Spark Cluster and 5 GB of Free IBM Object Storage. Some of our users shared that they are having trouble loading data in notebooks due to inexperience working with Cloud Data Services. Most of them are used to working with data hosted on their own laptop. We want to provide for you the easiest experience to work with your data, and with the tools and libraries that you already know.

We are excited to announce a new feature in IBM Data Science Experience that will help users create data frames in one click using the Jupyter Notebooks interface.

Upload data to Object Storage

Uploading data to Object Storage is very simple. Just drag and drop your file in the notebook and, Magic! ✨ , the file is uploaded and you will see it available in the Notebook palette. There is a progress bar to show how long the upload process will take, which depends on the file size.

Create data frames to start your analysis

A data frame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table.

Once your file is uploaded, it appears in the Notebook palette. Now, click Insert Code, which will open a drop-down menu with different options to create different types of data frames depending on your preference and language:

Python Notebook

  • Pandas DataFrame
  • Spark SQL DataFrame
  • Spark RDD
  • Insert Credentials

R Notebook

  • R base DataFrame
  • Spark SQL DataFrame
  • Insert Credentials

Note that today this feature is only supported in CSV files but if you like it we will extend it quickly to other file formats!

This action will create a new cell in the notebook that will perform four actions:

  1. Install and import all the needed libraries to load the data. This action happens only the first time that you use this feature in the notebook, since you need only to load the libraries once.
  2. Connect to the Object Storage object, automatically inserting the credentials for you.
  3. Load the data frame.
  4. Display a preview of the data frame.

See a demo in action here: