Adding a custom dataset to Databricks Community Edition

Roshini Johri
2 min readSep 19, 2018

This is a reference for anyone reading my spark tutorials or for someone who wants to do what the title says :D . This article assumes that you already know about the databricks community edition to use spark. If you have never used it before, this is a brief “how to get started” on that.

I am going to go with the wine dataset because that is what I am using for my tutorial.

Its really simple to upload your own dataset on databricks. Please follow the following steps:

Step 1:

Click on the Data icon on the side panel and then click on Add Data

Click on “Add Data” on the right hand corner

Step 2:

Drag and drop the wineQualityReds.csv file in the given box.

Drag and drop the file in the greyed out area where it says drop files to upload

Step 3:

You have two choices now. If you create table with UI, databricks will create the table for you in the default database and you can read the table by simply doing a spark read command:

spark.read.table("winequalityreds_csv")

This is what the interface will look like when you select that option:

Note: I selected the first row is header and infer schema but this is optional. It sometimes doesn’t generate the preview until you click in the table preview window.

If you choose for the create table in notebook option, then databricks will create a temp table for you and generate the code in the notebook. You can then write this to the default database if you prefer it to be available in different notebooks as this option will restrict it to the notebook that was automatically generated. The default for this notebook is python so if you prefer to use Scala, SQL or R, you can either write it and read it in a notebook with the chosen language. If you want to use the same notebook than you can use the %{LanguageName} syntax to code in your preferred language but remember that this has to be done for every code block.

Hope this is helpful! Now lets get started with the spark tutorial part 2.

--

--

Roshini Johri

AI Engineer. I write about machine learning, AI, WIMLDS, the environment and random things I dream about.