Read SAP HANA data from Azure Databricks

Anant Arora
4 min readJul 3, 2023

--

It has been a couple of weeks since I blogged last time, but I think I’m now on track with other projects and I’m happy to continue the SAP on Azure series! In the past few episodes, I focused on the integration of the SAP system with Azure data platform services. Today I will continue the topic and I would like to show you how to consume data stored in SAP HANA from the Azure Databricks. I know few people were interested in this topic so I’m happy I finally found time to write this short tutorial.

SAP HANA database can be accessed using the JDBC drivers. The Azure Databricks supports using external libraries to connect to external systems, so the entire process is very straightforward! The JDBC adapter for SAP HANA is part of the database client libraries and can be downloaded from the SAP Support Launchpad or the SAP Development Tools. We are interested in a small Java file ngdbc which we need to download — and then upload to Azure Databricks.

Once the file is downloaded we can publish it in the Azure Databricks library. Open the target workspace (you can choose your own or the shared one) and create a new library

Drag and drop the previously downloaded file to the grey area of the screen. The library name will populate automatically and you can confirm library creation by clicking the Create button.

On the next screen select the cluster on which you wish to install the library.

The installation doesn’t take much time. In my case, it took around 1–2 minutes and afterward the status changed to Installed.

With the libraries installed we can check the connectivity to the database and read sample data. My Databricks cluster is deployed to the same virtual network as the SAP HANA database so I don’t need to create peering between vnets. The below script, written in the Scala language (thanks for good documentation!), reads and displays data stored in the SFLIGHT table:

%scala
import java.util.Properties

//Set connection parameters
val jdbcHostname = "<hostname>"
val jdbcPort = "<port>"
val jdbcDB = "<SID>"
val jdbcUser = "<Username>"
val jdbcPassword = "<Password>"
val driverClass = "com.sap.db.jdbc.Driver"
val jdbcUrl = s"jdbc:sap://${jdbcHostname}:${jdbcPort}"

//Check availability of the JDBC library to access SAP HANA
Class.forName(driverClass)

//Set connection properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUser}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", driverClass)

//Read and display data
val sflight = spark.read.jdbc(jdbcUrl, "<SCHEMA.TABLE>", connectionProperties)
sflight.show()

It is also possible to fetch data through SAP HANA calculation view. I installed the SAP HANA Shine demo content that contains a set of tables and data models to practice and learn. In the below example I’m using the Purchase Order Worklist dashboard that underneath uses the PURCHASE_ORDER_WORKLIST calculation view.

I changed the command used in the previous example to fetch data from the calculation view:

//Read and display data
val po_worklist = spark.read.jdbc(jdbcUrl, "\"_SYS_BIC\".\"sap.hana.democontent.epm.models/PURCHASE_ORDER_WORKLIST\"", connectionProperties)
po_worklist.show()

The Databricks automatically detects data schema and creates a table that can be displayed through the UI (and used with other queries).

That’s everything for today! I hope you like it and it’s helpful for you. I still want to write one more blog about using SAP HANA Spark Controller with the Azure Databricks and then we’ll jump to another topic.

--

--