Apache Zeppelin: OAuth integration using Apache Knox

Saravanan Elumalai
Jul 10, 2018 · 3 min read

Apache Zeppelin is a web-based notebook platform that enables interactive data analytics with interactive data visualizations and notebook sharing.

Zeppelin natively supports LDAP/PAM based authentication and user role mapping using Apache Shiro. OAuth integration is not natively available but in latest version KnoxSSO support is added. Using KnoxSSO we can integrate Zeppelin with any OAuth provider.

Apache Knox

Apache Knox is an Application Gateway for interacting with the REST APIs and UIs of Apache Hadoop deployments. Knox supports OAuth authentication for hadoop applications using KnoxSSO service. KnoxSSO service is an integration service that provides a normalized SSO token for representing the authenticated user.

Knox user guide is the best resource on how to install and configure it. If you are planning to use Google OAuth you may have to build Knox from source due to the cookie size issue.

Generate OAuth Credentials

Create Client ID and Client secret for web application in any OAuth provider with the following configuration

Authorized JavaScript origins: https://KNOX_DOMAIN:KNOX_PORT
Authorized redirect URIs: https://KNOX_DOMAIN:KNOX_PORT/gateway/knoxsso/api/v1/websso?pac4jCallback=true&client_name=OidcClient

KnoxSSO Configuration

Create a new topology for KnoxSSO in KNOX_PATH/conf/topologies/knoxsso.xml

For OAuth support Pac4j can be used as federation provider.

<?xml version="1.0" encoding="utf-8"?>
<topology>
<gateway>
<provider>
<role>federation</role>
<name>pac4j</name>
<enabled>true</enabled>
<param>
<name>pac4j.callbackUrl</name>
<value>https://KNOX_DOMAIN:KNOX_PORT/gateway/knoxsso/api/v1/websso</value>
</param>
<param>
<name>clientName</name>
<value>OidcClient</value>
</param>
<param>
<name>oidc.id</name>
<value>CLIENT_ID</value>
</param>
<param>
<name>oidc.secret</name>
<value>CLIENT_SECRET/value>
</param>
<param>
<name>oidc.preferredJwsAlgorithm</name>
<value>RS256</value>
</param>
<param>
<name>pac4j.id_attribute</name>
<value>email</value>
</param>
<param>
<name>oidc.discoveryUri</name>
<value>OPENID_CONFIGURATION_URL</value>
</param>
<param>
<name>pac4j.cookie.domain.suffix</name>
<value>KNOX_DOMAIN_SUFFIX</value>
</param>
</provider>
</gateway>
</topology>

Identity Assertion

Identity assertion provider plays the critical role of communicating the identity principal to be used. In the above configuration email is configured in pac4j.id_attribute which is passed as username to Zeppelin. In some cases we need to transform this attribute to match the format we are using internally. For example we need to transform firstname.lastname@example.com to firstname_lastname. Identity Assertion provider can be used for these transformations. Identify assertion support static principal mapping, concat, switch case and regular expression. To convert email to name format we can use regular expression.

<provider>
<role>identity-assertion</role>
<name>Regex</name>
<enabled>true</enabled>
<param>
<name>input</name>
<value>(.*)\.(.*)@example\.com</value>
</param>
<param>
<name>output</name>
<value>{1}_{2}</value>
</param>
</provider>

Group Mapping

To support role based access in Zeppelin like admin only access for Interpreter settings we need to map the user to groups. Identity assertion provides HadoopGroupProvider which supports multiple methods like unix groups, LDAP lookup etc. If the instance is already configured with Active directory ShellBasedUnixGroupsMapping can be used.

<provider>
<role>identity-assertion</role>
<name>HadoopGroupProvider</name>
<enabled>true</enabled>
<param>
<name>hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
</param>
</provider>

KnoxSSO Service

Add the KnoxSSO service to the topology

<service>
<role>KNOXSSO</role>
<param>
<name>knoxsso.cookie.secure.only</name>
<value>true</value>
</param>
<param>
<name>knoxsso.token.ttl</name>
<value>3600000</value>
</param>
<param>
<name>knoxsso.redirect.whitelist.regex</name>
<value>^/.*$;^https?://KNOX_DOMAIN:\d{0,9}/.*$$</value>
</param>
</service>

Knoxssout topology

Create a new topology Knoxssout in KNOX_PATH/conf/topologies/knoxsso.xml and the following topology which is used for logout

<topology>
<gateway>
<provider>
<role>webappsec</role>
<name>WebAppSec</name>
<enabled>true</enabled>
<param>
<name>cors.enabled</name>
<value>true</value>
</param>
</provider>
</gateway>
<service>
<role>KNOXSSO</role>
</service>
</topology>

We need the public key of the knox server for signature verification. Export the certificate and copy it to conf folder.

KNOX_PATH/bin/knoxcli.sh export-cert
cp KNOX_PATH/data/security/keystores/gateway-identity.pem /etc/knox/conf/knoxsso.pem

Zeppelin Configuration

Configure KnoxSSO authentication in shiro.ini

[main]
knoxJwtRealm = org.apache.zeppelin.realm.jwt.KnoxJwtRealm
knoxJwtRealm.providerUrl = https://KNOX_DOMAIN:KNOX_PORT/
knoxJwtRealm.login = gateway/knoxsso/api/v1/websso
knoxJwtRealm.logout = gateway/knoxssout/api/v1/webssout
knoxJwtRealm.logoutAPI = true
knoxJwtRealm.redirectParam = originalUrl
knoxJwtRealm.cookieName = hadoop-jwt
knoxJwtRealm.publicKeyPath = /etc/knox/conf/knoxsso.pem
knoxJwtRealm.groupPrincipalMapping = group.principal.mapping
knoxJwtRealm.principalMapping = principal.mapping
authc = org.apache.zeppelin.realm.jwt.KnoxAuthenticationFilter

Thank you for reading!

Curated stories on big data systems

Saravanan Elumalai

Written by

Data Collective
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade