Zeppelin on spark cluster
3 min readJun 8, 2020
To create your own spark/yarn cluster see the previous articles.
Install Softwears
Download zeppelin distrubution
wget https://downloads.apache.org/zeppelin/zeppelin-0.9.0-preview1/zeppelin-0.9.0-preview1-bin-all.tgzExtract to any location.
My location(/home/shehan/zeepelin)
Install Python 3 (Do this for all nodes and this is for pyspark)
yum install -y python3Update envirement properties
nano .bash_profile
export PYSPARK_PYTHON=python3
source .bash_profile
Configure Zeppelin
Open zeppelin-env.sh & add properties
nano zeepelin/conf/zeppelin-env.shexport JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/
export HADOOP_CONF_DIR=/home/shehan/hadoop/etc/hadoop
export SPARK_HOME=/usr/local/spark/
export ZEPPELIN_PORT=9010
export MASTER=spark://master.hadoop.smf:7077
export ZEPPELIN_INTERPRETER_OUTPUT_LIMIT=1048576
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3
Open zeppelin-site.xml & add properties
<property>
<name>zeppelin.server.addr</name>
<value>master.hadoop.smf</value>
<description>Server binding address</description>
</property>
Ok We are done.
Start spark cluster & Zeppelin(make sure dfs daemon is running)
Start Spark cluster
/usr/local/spark/sbin/start-master.sh
/usr/local/spark/sbin/start-slaves.shStart zeppelin deamon
./zeepelin/bin/zeppelin-daemon.sh start
Now you should able to see the zeppelin UI
Configure Spark interpreter
Goto the Spark interpreter and update configuration with a master URL. Once update, it will ask you to restart the interpreter.
Once the interpreter starts, it will create a new application in spark and it will run continuously while the interpreter is active. So let’s start the interpreter.
Type sc.version and then execute it
Let’s create a new note and write some scala code.
Here I have some files in the Hadoop cluster.
u.item - this file containes moveie Information about movies. And this is a tab separated columns
Print 2000 rows
final case class Movie(name: String, year: String)val movieFile = sc.textFile("/user/shehan/ml-100k/u.item").map(line => {val attr = line.split('|');Movie(attr(1),attr(2))})movieFile.toDF().show(2000,false)
And each action invoked on RDD will create spark jobs and we can see these jobs by clicking SPARK JOB icon.
Zeppelin runs on Yarn Cluster
Stop spark cluster
/usr/local/spark/sbin/stop-slaves.sh
/usr/local/spark/sbin/stop-master.shstart yarn daemons
start-yarn.shAdd/update property to zeppelin spark intepreter
master yarn
spark.submit.deployMode client