Apache Zeppelin: OAuth integration using Apache Knox
Apache Zeppelin is a web-based notebook platform that enables interactive data analytics with interactive data visualizations and notebook sharing.
Zeppelin natively supports LDAP/PAM based authentication and user role mapping using Apache Shiro. OAuth integration is not natively available but in latest version KnoxSSO support is added. Using KnoxSSO we can integrate Zeppelin with any OAuth provider.
Apache Knox
Apache Knox is an Application Gateway for interacting with the REST APIs and UIs of Apache Hadoop deployments. Knox supports OAuth authentication for hadoop applications using KnoxSSO service. KnoxSSO service is an integration service that provides a normalized SSO token for representing the authenticated user.
Knox user guide is the best resource on how to install and configure it. If you are planning to use Google OAuth you may have to build Knox from source due to the cookie size issue.
Generate OAuth Credentials
Create Client ID and Client secret for web application in any OAuth provider with the following configuration
Authorized JavaScript origins: https://KNOX_DOMAIN:KNOX_PORT
Authorized redirect URIs: https://KNOX_DOMAIN:KNOX_PORT/gateway/knoxsso/api/v1/websso?pac4jCallback=true&client_name=OidcClient
KnoxSSO Configuration
Create a new topology for KnoxSSO in KNOX_PATH/conf/topologies/knoxsso.xml
For OAuth support Pac4j can be used as federation provider.
<?xml version="1.0" encoding="utf-8"?>
<topology>
<gateway>
<provider>
<role>federation</role>
<name>pac4j</name>
<enabled>true</enabled>
<param>
<name>pac4j.callbackUrl</name>
<value>https://KNOX_DOMAIN:KNOX_PORT/gateway/knoxsso/api/v1/websso</value>
</param>
<param>
<name>clientName</name>
<value>OidcClient</value>
</param>
<param>
<name>oidc.id</name>
<value>CLIENT_ID</value>
</param>
<param>
<name>oidc.secret</name>
<value>CLIENT_SECRET/value>
</param>
<param>
<name>oidc.preferredJwsAlgorithm</name>
<value>RS256</value>
</param>
<param>
<name>pac4j.id_attribute</name>
<value>email</value>
</param>
<param>
<name>oidc.discoveryUri</name>
<value>OPENID_CONFIGURATION_URL</value>
</param>
<param>
<name>pac4j.cookie.domain.suffix</name>
<value>KNOX_DOMAIN_SUFFIX</value>
</param>
</provider>
</gateway>
</topology>
Identity Assertion
Identity assertion provider plays the critical role of communicating the identity principal to be used. In the above configuration email is configured in pac4j.id_attribute which is passed as username to Zeppelin. In some cases we need to transform this attribute to match the format we are using internally. For example we need to transform firstname.lastname@example.com to firstname_lastname. Identity Assertion provider can be used for these transformations. Identify assertion support static principal mapping, concat, switch case and regular expression. To convert email to name format we can use regular expression.
<provider>
<role>identity-assertion</role>
<name>Regex</name>
<enabled>true</enabled>
<param>
<name>input</name>
<value>(.*)\.(.*)@example\.com</value>
</param>
<param>
<name>output</name>
<value>{1}_{2}</value>
</param>
</provider>
Group Mapping
To support role based access in Zeppelin like admin only access for Interpreter settings we need to map the user to groups. Identity assertion provides HadoopGroupProvider which supports multiple methods like unix groups, LDAP lookup etc. If the instance is already configured with Active directory ShellBasedUnixGroupsMapping can be used.
<provider>
<role>identity-assertion</role>
<name>HadoopGroupProvider</name>
<enabled>true</enabled>
<param>
<name>hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
</param>
</provider>
KnoxSSO Service
Add the KnoxSSO service to the topology
<service>
<role>KNOXSSO</role>
<param>
<name>knoxsso.cookie.secure.only</name>
<value>true</value>
</param>
<param>
<name>knoxsso.token.ttl</name>
<value>3600000</value>
</param>
<param>
<name>knoxsso.redirect.whitelist.regex</name>
<value>^/.*$;^https?://KNOX_DOMAIN:\d{0,9}/.*$$</value>
</param>
</service>
Knoxssout topology
Create a new topology Knoxssout in KNOX_PATH/conf/topologies/knoxsso.xml and the following topology which is used for logout
<topology>
<gateway>
<provider>
<role>webappsec</role>
<name>WebAppSec</name>
<enabled>true</enabled>
<param>
<name>cors.enabled</name>
<value>true</value>
</param>
</provider>
</gateway>
<service>
<role>KNOXSSO</role>
</service>
</topology>
We need the public key of the knox server for signature verification. Export the certificate and copy it to conf folder.
KNOX_PATH/bin/knoxcli.sh export-cert
cp KNOX_PATH/data/security/keystores/gateway-identity.pem /etc/knox/conf/knoxsso.pem
Zeppelin Configuration
Configure KnoxSSO authentication in shiro.ini
[main]
knoxJwtRealm = org.apache.zeppelin.realm.jwt.KnoxJwtRealm
knoxJwtRealm.providerUrl = https://KNOX_DOMAIN:KNOX_PORT/
knoxJwtRealm.login = gateway/knoxsso/api/v1/websso
knoxJwtRealm.logout = gateway/knoxssout/api/v1/webssout
knoxJwtRealm.logoutAPI = true
knoxJwtRealm.redirectParam = originalUrl
knoxJwtRealm.cookieName = hadoop-jwt
knoxJwtRealm.publicKeyPath = /etc/knox/conf/knoxsso.pem
knoxJwtRealm.groupPrincipalMapping = group.principal.mapping
knoxJwtRealm.principalMapping = principal.mapping
authc = org.apache.zeppelin.realm.jwt.KnoxAuthenticationFilter
Thank you for reading!