Github Actions to deploy your Spark JAR Files

Gunj Desai
Doubtnut
Published in
2 min readJul 17, 2021

A comprehensive guide on how to build, push & deploy your Spark JAR’s using Github Actions

A lot of Data Engineers I know (myself included), suffer from automating their spark pipeline builds, as they end up building locally and then uploading their JAR’s through a UI.
This may not be an issue initially but it definitely starts to become an issue when you have multiple team members referencing the same project or when there are constant integrations required in the project.

Today we are gonna use Github Actions for instant integration and deployment, so that henceforth, you can focus on what you like best, that is writing code.

What are Github Actions ?

Integrated within Github, they are tools to automate, customise or even chain and execute multiple workflows for your code repository

Example of some workflows can be

  • Lint checking before creating a PR
  • Run test before merging with main
  • Build Docker Image post merge
  • Run a cron script at a specific time

For this example, I am going to be build a Spark Job written in Scala which uses gradle as a build tool.

Github Actions looks for yml files in .github/workflows folder to run as actions

Our action file will have the following steps

  • Checkout our Code Repo
  • Configure AWS Credentials (since they will be needed in code, more on this ahead)
  • Login into AWS ECR (as this is our docker repository, alternate actions are available for other docker repositories)
  • Build Docker Image & Push to Repository
  • Run Docker Image (running the image triggers the script to push the JAR to S3)

The entrypoint path to the docker image is a shell script which is responsible for pushing the JAR to S3

That’s all you need to do to have your all your production JAR’s in one single destination.

This was my first attempt at trying to explain something that we learnt while adding more automation to our Spark Jobs @ Doubtnut
Do share your feedback on the blog and and I am happy to answer any questions you have here or you can ping me on
twitter

--

--