Schedule your BigQuery jobs with Play Framework 3 and Akka

Published in

Google Cloud - Community

5 min readJul 6, 2024

Built on Akka, Play provides predictable and minimal resource consumption (CPU, memory, threads) for highly-scalable applications.
BigQuery is a fully managed, AI-ready data analytics platform that helps you maximize value from your data and is designed to be multi-engine, multi-format, and multi-cloud.

This article will show how to implement a BigQuery job with Play Framework and Akka. The job will run every day at midnight.

Before starting it is recommended to have the basics of Play Framework and Google Cloud BigQuery.

Sequence diagram

In the diagram below, we see that the application will execute a BigQery query every midnight, which will update the filtered_table table. After the update, a response containing the number of bytes read is sent asynchronously.

So we will need
- access to be able to execute BigQuery queries from Java code
- the Java project to execute our Job

BigQuery Setup

Assuming you have a Google Cloud account:
a) Create a new project that we will call “Big Query Example”

b) Ensure that the big Query API is enabled

c) Create a service account to allow Java code to authenticate.
In IAM and admin >> Services accounts, click Create service account
- Make sure the “Big Query Example” project is selected
- Service account name: Big Query
- Service account ID: big-query
Click Create and Continue

d) Add the Big Query Admin role and click Done

e) You must have a screen like this without keys “No keys” in the Key ID column like this

f) We will add a key by selecting “Manage keys”

Then “Create new key”, choose “Json” and click “Create”

If all goes well, a .json file will be downloaded. Note these fields they will be used later: project-id, client-email, private-key.

g) In BigQuery create a covid_data dataset.

Play Framework project

Before you start You must install JDK 11 or higher and Sbt and have the basics of Play Framework

a) We will create a new project. For this, we will use the command

sbt new playframework/play-java-seed.g8

and fill like the image below

scala_version and sbt_giter8_scaffold_version may differ depending on the machines.

b) We will use the Alpakka library for Google Bigquery. Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.

In the build.sbt file, adapt libraryDependencies to have this

libraryDependencies ++= Seq(
  guice,
  "com.lightbend.akka" %% "akka-stream-alpakka-google-cloud-bigquery" % "8.0.0",
)

this will add the akka library which will later help us communicate with Bigquery.
For compatibility issues between Play 3 and the Jackson Library, we will add this

dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.14.2"

Finally, to allow sbt to find the akka repo when resolving dependencies, we will add this

resolvers += "Akka library repository".at("https://repo.akka.io/maven")

Your full build.sbt should look like this

to download the dependencies and make sure everything is ok run

sbt compile

c) we will create the app/services folder and we will add the classes:

IBigQueryJob.java

BigQueryJobImpl.java

The runJob() function in our example will update the “filtered_table” table with the data read from the “covid19_open_data” table.

d) Now that we have our function to execute the job, we will write the piece of code allowing us to execute this function regularly.
We will create the app/schedulers folder and we will add the classes:
- JobSchedulerActor.java in which we will indicate the routine to be executed every day at midnight in our case. Your code should look like this

- JobScheduler.java which is our scheduler in which we will tell it to trigger every day at midnight. Your code will look like this

The wait variable will calculate the remaining time it takes for it to trigger at midnight

e) We need to activate our schedulers.
- create a BigQueryModule.class in the app folder. Your code should look like this

- reference this module in the conf/application.conf in which we will add this

play.modules.enabled += "BigQueryModule"

f) add the BigQuery identifiers. Still in the application.conf file add this at the end

alpakka.google {
  credentials {
    service-account {
      project-id = "your project-id"
      client-email = "your client-email"
      private-key = "your private-key"
    }
  }
}

replace project-id, client-email, and private-key with the values retrieved from the .json file obtained in the service account section above.

g) Finally we will go to conf/logback.xml and add this in the <configuration> tag

<logger name="schedulers" level="INFO"/>

This will allow us to display the logs of the classes found in the schedulers package.

Your final tree should look like this

Finally, we will execute sbt run. Then you will need to launch http://localhost:9000/
Except that to test you have to wait until midnight. We are going to modify the code just for testing so that the scheduler starts right away and at 5-second intervals.

The JobScheduler class becomes

Finally we have the following results

and in BigQuery

Conclusion

This article will show how to schedule our BigQuery tasks using Play Framework 3. This example is ideal for independent tasks. If the jobs depend on each other, a good solution would be to use Data Pipeline Tools like Airflow, Beam, Oozie, Azkaban, and Luigi.

The source code of the application is available on GitHub.

Please clap for this article if you enjoyed reading it. For more about Google Cloud, data science, data engineering, and AI/ML and software architecture follow me on LinkedIn.

Schedule your BigQuery jobs with Play Framework 3 and Akka

Sequence diagram

BigQuery Setup

Play Framework project

Conclusion

Written by William Mekomou