Databricks turned into the favorite platform for many data engineers, data scientists, and ML experts. It combines data, analytics, and AI. It’s multi-cloud and now you can also use it on GCP.
This article will walk you through the main steps to become efficient with Databricks on Google Cloud.
1. Get the Foundation Right — From Subscription to User Creation
To get started, let me link to a step-by-step tutorial that covers everything on video from creating a subscription, pre-requisites, creating a Databricks workspace, adding users to the workspace, to running your first job.
Make sure to get this right. Even if you are like me, i.e. someone who is not reading the instructions for IKEA furniture, make sure to get this right (it will save you trouble later, if you e.g. set quotas correctly from the very beginning)
Also, check the official documentation.
2. The Persona View
All of your Databricks assets are assessed using the sidebar. The sidebar’s contents depend on the selected persona: e.g. Data Science & Engineering, or Machine Learning.
By default, the sidebar appears in a collapsed state and only the icons are visible. Move your cursor over the sidebar to expand to the full view.
3. Explore the Quickstart Notebook
Ok, you passed all the setup steps swimmingly, but you are not a seasoned programmer and you wouldn’t know how to write code in a notebook? No worries, not everyone is a data engineer or data scientist.
From every GCP workspace, you can start exploring a quick-start notebook. Quick start notebooks are a great way to explore and run short snippets of easy-to-understand code. For aspiring data scientists, this is a great way to learn how to implement core functionality.
4. Notebook Gallery
The Databricks notebook gallery showcases some of the possibilities through Notebooks which can easily be imported into your own Databricks environment.
5. Solution Accelerators
Solution accelerators are Databricks notebooks that tackle common, high-impact use cases. They are designed to help Databricks customers go from idea to PoC in less than 2 weeks. Check them out and discuss them with your solution architecture team or watch the quick YouTube introduction.
6. Technical Resources You Should Know
There are many more technical articles that help you to get up to speed with Databricks on Google Cloud:
Big thanks to Silviu Tofan for supporting this article and Databricks on GCP.
Shoutout to Jon Tyson on unsplash for the photo used in this article. Great shot!