SAML 2.0 based Federation between Databricks (AWS) and ForgeRock OpenAM SSO

Jatinder Singh
Securing Digital Identity
5 min readFeb 4, 2018

Databricks is a cloud-based big data processing platform for working with Apache Spark, that provides automated cluster management (using AWS or Microsoft Azure) and Jupyter style notebooks to create and share documents that contain live code, equations, visualizations and narrative text.

OpenAM as part of the ForgeRock stack provides a service called Access Management, which manages access to resources, such as a web page, an application, or web service, available over the network. When you need to federate identities across different domains and different organizations, then you need interoperable federation technologies. OpenAM has the capability to integrate well in federated access management scenarios and this article will demonstrate steps to enable SAMLv2 federation between Databricks and ForgeRock OpenAM to allow Single Sign-On (SSO).

Before getting started, I would like to share a bit about me. I provide consulting on ForgeRock Identity & Access Management and Data Engineering working with Databricks and Spark cluster-computing framework. And I am in the process of obtaining AWS Certified Solutions Architect — Associate certificate (February 2018) and AWS Certified Big Data — Specialty. If you or your client has a need for IAM or Data Engineering consultant, please feel free to reach out to me. I work across North America including East Coast, Mid-West, and West Coast.

Below is a list of prerequisites and assumptions for this article:

  • You have a premium Databricks subscription with “Operational Security” add-on enabled. Please note, you won’t be able to enable SSO without this add-on;
  • OpenAM expects SP metadata URL or metadata file in order to configure a remote service provider. We were unable to obtain the SP metadata from any of the URLs provided by Databricks and had to reach out to them for support. We suggest you reach out to Databricks if you are unable to access SP metadata. The metadata file they provided was a minimal file which you can see in the below code snippet. For this article, we assume you have the SP metadata file from Databricks.
<?xml version="1.0"?>
<md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata"
validUntil="2017-01-02T19:47:32Z"
cacheDuration="PT604800S"
entityID="https://YOUR-DATABRICKS-DEPLOYMENT.com">
<md:SPSSODescriptor AuthnRequestsSigned="false" WantAssertionsSigned="false" protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
<md:NameIDFormat>urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified</md:NameIDFormat>
<md:AssertionConsumerService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST"
Location="https://YOUR-DATABRICKS-DEPLOYMENT.com/saml/consume"
index="1" />

</md:SPSSODescriptor>
</md:EntityDescriptor>
  • We used the latest OpenAM binary v5.5.1;
  • Your OpenAM IDP is configured and running;
  • Your OpenAM IDP has outbound connection to the internet.

As an AWS professional and to demonstrate the federation steps, we will use EC2 t2.microinstance to spin an OpenAM instance running on Tomcat8 server. We will first configure our IDP and then enable SAMLv2 federation in Databricks SP. And below is essentially a 10,000-foot architecture of what we are trying to achieve:

10,000-foot Architecture

Identity Provider (IDP) SAMLv2 Configuration

  • As a best practice, we will configure SAMLv2 federation in a new realm to keep federation separate from other realms. We will call the new realm “databricks” and have the DNS aliases set to the domain name of our Databricks deployment.
Create New Realm
  • In the “databricks” realm, click on Common Tasks > Configure SAMLv2 Provider > Create Hosted Identity Provider.
Create Hosted Identity Provider
  • To configure your OpenAM instance as a SAMLv2 Identity Provider, please ensure the fields in the image are configured correctly. For demonstration, we are using the”test” signing key but you will use a valid certificate configured in your OpenAM Keystore. We named the Circle Of Trust (COT) AMDatabricksv2.
Hosted Identity Provider Configuration
  • SAMLv2 metadata provided by Databricks sets the NameIDFormat to unspecified, which basically means it is up to the identity provider to determine which name identifier format to use. Under Realms > databricks > Applications > SAML > IDP Entity Provider you Created > Assertion Context Tab > NameID Format > NameID Value Map > Set urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified=cn

Note: Please note you will get a “SAML2Exception: Unable to generate NameID value” exception, if you miss this step and your SAMLv2 federation will fail.

Configure NameID Format
  • With IDP configured, the next step is to configure a remote service provider. Assuming you are already inside the databricks realm, click on Configure SAMLv2 Provider (under Common Tasks) > Configure Remote Service Provider. On this page, you will upload your Databricks SAMLv2 metadata file by selecting the file radio button. In order to add this remote SP to your existing COT (AMDatabricksCOTv2 which you created above), leave the COT field empty. Once completed, click the Configuredbutton to create your SAMLv2 Remote Service Provider.
Configure Remote Service Provider
  • Your Federation tab should look like the below screenshot. If you do not see both your IDP and SP entities under the configured COT, make sure to add the missing entity.
Federation Tab

IDP SAMLv2 Metadata

You will need to provide Databricks the SSO URL of your IDP and x.509 Certificate. Both SSO URL and the certificate values are available in your SAMLv2 metadata file. You can access the IDP metadata in OpenAM using the below URL:

IDP SAMLv2 Metadata

http://IDP_HOSTNAME:8080/openam/saml2/jsp/exportmetadata.jsp?realm=/databricks&entityid=IDP_ENTITY_ID_YOU_CONFIGURED

SP SAMLv2 Metadata (in case you want to verify)

http://IDP_HOSTNAME:8080/openam/saml2/jsp/exportmetadata.jsp?realm=/databricks&entityid=SP_ENTITY_ID_YOU_CONFIGURED

Once you have access to the IDP metadata file, you will use the elements:

  • ds.X509Certificate — to copy your Public Key value to Databricks;
  • SingleSignOnService > urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST — to configure SSO URL in Databricks.

Databricks (SP) SAMLv2 Configuration

Configuring Databricks as a Service Provider (SP) is fairly quick and easy. Click on Admin Console > Single Sign-On Tab > and follow the below steps:

  • Copy the SingleSignOnService value from above in the Single Sign-On URL field;
  • Copy your Databricks deployment URL in the Identity Provider Issuer URL field;
  • Copy the ds.X509Certificate value from above in the x.509 Certificate field and click Enable SSO button.
Databricks SSO Configuration

With your Databricks SP configured, you can now test your SAML Federation SSO. If everything is configured correctly, you will be redirected to IDP for user authentication and back to Databricks on successful authentication.

Thank you for reading this article. For any questions on OpenAM SAML federation with Databricks, please feel free to leave a comment or IM me directly. I would be more than happy to help.

--

--

Jatinder Singh
Securing Digital Identity

Identity & Access Management Expert on ForgeRock platform. Certified AWS Solutions Architect.