Apache Zeppelin: Hive interpreter with user impersonation
Apache Zeppelin is a web-based notebook platform that enables interactive data analytics with interactive data visualizations and notebook sharing.
We can integrate Hive using JDBC Interpreter. Following hive configuration and paths are from Cloudera setup but the same should be applicable for any other Hadoop distributions.
Interpreter Configuration
Go to the Interpreter screen and click +Create to create a new interpreter
Interpreter Name: hive
Interpreter group: jdbc
Change the following properties
default.driver org.apache.hive.jdbc.HiveDriver
default.url jdbc:hive2://hive_host:10000/
Artifacts
org.apache.hive:hive-jdbc:0.14.0
org.apache.hadoop:hadoop-common:2.6.0
(or) copy the jars from node itself
cp /opt/cloudera/parcels/CDH/lib/hadoop/client/*.jar /zeppelin/path/interpreter/jdbc/
cp /opt/cloudera/parcels/CDH/jars/hive-jdbc-* /zeppelin/path/interpreter/jdbc/
Configure HADOOP_HOME and HADOOP_CONF_DIR in zeppelin-env.sh
Kerberos Configuration
If the cluster is kerberized add the following properties for hive interpreter
default.url jdbc:hive2://hive_host:10000/;principal=hive/_HOST@DOMAIN
zeppelin.jdbc.auth.type KERBEROS
zeppelin.jdbc.keytab.location /path/to/keytab zeppelin.jdbc.principal principal_name
User Impersonation
User impersonation is must for authorizing users to run queries based on Sentry rules. To enable user impersonation add the user who can impersonate other users in hadoop.proxyuser.hive.groups. For Cloudera setup Hue user can be used.
Add the following properties for hive interpreter
zeppelin.jdbc.auth.kerberos.proxy.enable true
default.proxy.user.property hive.server2.proxy.user
Interpolation
In the latest zeppelin version we can use the Z context variables in SQL query like
select * from patents_list where priority_country = ‘{country_code}’
To enable it add the following property
zeppelin.jdbc.interpolation true
Thank you for reading!
For any questions contact me @limesaltsoda