Spark Thrift Server Deep Dive

somanath sankaran

Published in

Analytics Vidhya

3 min readDec 14, 2019

This is one of my stories in spark deep dive series

somanath sankaran - Medium

Read writing from somanath sankaran on Medium. Big Data Developer interested in python and spark. Every day, somanath…

medium.com

One of the underrated and interesting service is spark thrift server.Let us see the uses of thrift-server in detail.

Spark Thriftserver
Uses Of Spark Thrift Server
Starting thrift server and how it works
Connecting Thift Server with SQL Alchemy
Exploring Thrift Server UI

Spark Thrift Server:

It is the service which provides a flavour of server-client (jdbc/odbc)Facility with spark

Server Client facility means we don’t need the spark to be installed in our machine .Instead we will be a client and we will be given a server url to which

we can connect and use the data with our application for example in our use case we will using Pyhive Client to connect to spark ecosystem started in some server machine

Uses Of Spark Thrift Server

Connect with BI tools like tablaeu,superset etc
Connect spark table and queries with apps written in Java Python etc without starting a spark application

Starting thrift server

We can start thriftserver under $SPARK_HOME/sbin

On starting the thriftserver it will display it will be logging in a file

On inspecting the file I found out it internally calls spark class,So the advantage is we can specify spark properties along with start-thrift server with — conf

how it works

It internally calls hive thriftserver and will expose port localhost:10000 by default to which we can send sql queries to fetch results .The spark thriftserver will use the executor memory option specified to run the queries

So we have to increase the executor with — conf num-executors parameter to get improved latency

We can verify thriftserver is started by seeing web ui as well where spark will say it is running as a thrift server