StarThinker
Published in

StarThinker

StarThinker On Airflow / Composer

A good re-usable framework can be re-used on good infrastructure like Airflow / Composer, hence Any StarThinker recipe can be deployed to Composer / Airflow in a few minutes. If you’re wondering how? Here are the steps…

This guide assumes you have Airflow deployed locally, or via Google Cloud Composer.

Install StarThinker

Remember to add StarThinker package to Composer / Airflow. To ensure Composer uses the latest version, specify the version number as ==X.X.X.

Select A DAG

StarThinker uses a DAG Factory to convert recipes into Airflow modules. The factory turns each StarThinker task into an Airflow task. For example this StarThinker Recipe is converted into this Airflow Dag. Don’t worry, you can use any Airflow operator native or your won with the StarThinker DAG Factory. This sample DAG shows all three types of calls in one workflow.

Start with any DAG in the repository, they can all be easily extended by adding tasks from any other DAG or the Scripts folder.

Partial list of StarThinker Dags.

Modify The DAG

Each StarThinker DAG contains detailed instructions on how to set it up and behaves exactly like a standard Airflow DAG. All inputs to a StarThinker DAG are contained in the INPUTS dictionary. These INPUTS are then merged with the RECIPE automatically by the factory. You will need to fill in the INPUT fields ONLY before uploading the DAG to composer.

Inputs for the BigQuery Dataset DAG.

The RECIPE section typically does not need to be modified unless you are adding tasks. In that case, modify the INPUT section to contain the fields that were added. The INPUT name should match the RECIPE field name.

Add Credentials

Most StarThinker DAGs utilize Google Cloud or Google API endpoints and this require credentials. Both USER and SERVICE credentials can be used in any task, and are chosen by the “auth” parameter within the RECIPE section.

You can scan the DAG for “auth” elements and either add the required credentials or change the value to “service” or “user” depending on which credentials you have. Keep in mind, your credentials will have to have access to whatever API you are calling via the DAG.

In addition to the methods described below, the credentials may also be included in the RECIPE section as:

{ ‘setup’:{‘id’:[PROJECT ID], ‘auth’:{‘user’:[JSON], ‘service’:[JSON]}}}

Add Service Credentials

The DAG factory will check for a starthinker_service connection and extract both the project ID as well as service credential. Fill in Project ID and Keyfile JSON, all other fields are not used. See service credentials instructions.

Example service credential connection.

Add User Credentials

The DAG factory will check for a starthinker_user connection and extract both the project ID as well as user credential. Fill in Project ID and Keyfile JSON, all other fields are not used. See service credentials instructions.

Example user credential connection.

Execute the DAG

Thats it, you’re ready to run. If the DAG contains a Day / Hour entry, it will be run on schedule, otherwise you will have to trigger it manually.

There are additional instructions for running and testing StarThinker DAGs from the command line. Yes, we added a helper to the factory that will print the correct airflow commands if you try to run the DAG via python directly.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store