How to kick-start Spark development on IntelliJ IDEA in 4 steps
I’m gonna walk you through the process of how to set up your environment in order to develop an Apache Spark application, using Scala, in IntelliJ IDEA 14.
- Download and install IntelliJ 14 and make sure the Scala plugin is enabled.
- Get the Scala Spark skeleton and move it in a directory of your choice; I’m assuming ~/sandbox/sparkgrep/ as the base now, in the following. You cd into the base, create a directory src/main/scala/spark/example/ — under Linux/MacOS with mkdir -p to create intermediate directories — and move SparkGrep.scala into the just created directory.
- Now head back to IntelliJ where you import the base directory (File → Import Project), wait a bit until everything is imported and then select the file SparkGrep.scala, right click it and choose ‘Run SparkGrep.scala’. That will fail with a message Usage: SparkGrep <host> <input_file> <match_term> because we haven’t supplied program arguments for execution, yet.
- So, to fix this open menu item Run → Edit Configurations, making sure the SparkGrep application is selected, and then insert the following into the Program arguments input field (found in the Configurations tab):
local[*] src/main/scala/spark/uha/SparkGrep.scala val
The above arguments mean to run the app locally, using src/main/scala/spark/uha/SparkGrep.scala as the input file, and to count the occurrences of val … this should yield something like the following as the final output line:
5 lines in src/main/scala/spark/uha/SparkGrep.scala contain val
Well done! You deserve your first spark. Please peel it off from the left and stick it on your laptop ;)
Where to go from here? Well, first I’d say you check out:
And once you’ve toyed around with Spark a bit and gathered enough experience deploying Spark in a cluster, mixing SQL with machine learning code, and tuning stream code I strongly recommend:
Have fun and looking forward to your comments!