How to integrate Apache Spark, Intellij Idea and Scala

Jupyter Notebook is really useful when you want to present some code, let someone reproduce your results or just learn how to use new tools and libraries. I use Jupyter almost every day and, as many others, I started learning Spark and developed my first data analysis pipelines using interactive notebooks and Python API. Then I realized that I want more and running notebooks locally is not enough for me, so I signed up for Databricks Community Ediditon subscription. Databricks allows to forget about the problems related to setting up and maintaining the environment.

Everyone who is learning and using Spark eventually realize that Python API is not as powerful and flexible as the core language of the framework — Scala. This language allows to start feeling the full power of Spark comprising Analytics, Streaming and Graph processing tools. However, Spark is just yet another framework for large scale data analytics. Yes, it is convenient and powerful, but it has a limited number of algorithms and sometimes you need to implement your own custom algorithm. And that is the moment when you need an IDE.

You can find an example project in my Git repository. It allows you to get started with Spark Scala development in Intellij Idea. Alternatively, you can follow step-by-step instructions on my blog and create this project from scratch by yourself.