Dockstore partners with AWS AGC to make launching workflows quick and easy!

Richard-Hansen
Dockstore
Published in
6 min readApr 11, 2022

Launching genomics workflows is challenging, especially in the cloud. Across various descriptor languages, engine versions and compute environments, finding and running reproducible workflows takes time and effort. The Global Alliance for Genomics and Health (GA4GH) seeks to address these challenges with the Workflow Execution Service (WES) Application Programming Interface (API). The WES API is a standard for submitting workflow execution, monitoring, and output retrieval requests to compute environments. Dockstore has partnered with Amazon Web Services’ (AWS) Amazon Genomics CLI (AGC) to leverage WES in supporting a fully integrated approach for launching your Dockstore workflows in the AWS cloud with just a few simple commands.

When a server implements the WES standard, it is agreeing to support a set of API calls which define a format for sending requests and receiving responses. This allows the workflow user to run the same command in the Dockstore CLI against different WES servers, each of which will interpret and execute the request in a similar manner. Furthermore, the commands that the Dockstore CLI uses to launch workflows locally can be easily modified to launch workflows on WES servers. This makes workflow development, execution, and monitoring easily transportable between compute environments, regardless of whether they are local or remote.

WES in AWS with AGC

The Amazon Genomics CLI is a command line tool for deploying cloud infrastructure that can be used to execute genomics workflows. Upon deploying infrastructure with AGC, you can submit workflows for execution, view their status, and retrieve run logs. To reduce infrastructure costs, AGC utilizes AWS Batch for compute resources, thus only using compute power when a workflow is running, and tearing down unnecessary resources once the workflow has completed.

You can communicate with the AGC infrastructure via their command line tool, or by directly sending WES HTTP requests. When you deploy infrastructure with AGC, an Amazon API Gateway endpoint is created. WES requests sent to this endpoint allow for access to AGC’s core functionality using a standardized format for simple integration and reproducibility.

WES in Dockstore

Dockstore provides a command line interface (CLI) called the Dockstore CLI. The Dockstore CLI communicates directly with the Dockstore service for easy access to all your Dockstore entries and supports sending requests in accordance with the WES standard. This allows you to launch your posted Dockstore entries simply by referencing them from the command line, and passing in your runtime parameters.

To set up the Dockstore CLI to communicate with a WES server requires updating the CLI config file located at ~/.dockstore/config with the URL of the WES server and the authorization method the Dockstore CLI should use for your request. For example, you can configure the Dockstore CLI to send WES requests to a server hosted on AWS. You will simply need to use an AWS named profile containing access and secret keys as the method of authorization. The resulting configuration file will be similar to the following:

[WES]
url: https://test.execute-api.us-west-2.amazonaws.com/prod/ga4gh/v1
authorization: aws-wes-profile
type: aws

With the updated config file, all that’s left to do is find and launch your workflow! A simple, parameter-free workflow can be found at https://dockstore.org/workflows/github.com/dockstore-testing/dockstore-whalesay2:master?tab=info, and can be launched from the CLI by running:

dockstore workflow wes launch --entry \
github.com/dockstore-testing/dockstore-whalesay2:master

By simply referencing a Dockstore entry you can send a WES request to a server of your choosing. To retrieve the execution logs of your workflow, you can use the workflow run ID provided from the above command and run:

dockstore workflow wes logs --id 12345–67890-example-runid

Which will return the workflow status and information regarding run outputs.

Install AGC and the Dockstore CLI

To install AGC, follow the official installation guide. To use the services provided by AGC, you will need to have an AWS account with the appropriate IAM roles for command execution. The minimal permissions required to use AGC are described here.

To install the Dockstore CLI, follow the Dockstore CLI quick start guide.

Launch a workflow using AGC and the Dockstore CLI

To tie all the pieces together, we will walk through the steps required to:

  1. Deploy your first AGC resources
  2. Launch a workflow hosted on Dockstore

Deploy AGC infrastructure

With AGC and the Dockstore CLI installed, we’re ready to start deploying resources. We’re going to activate AGC on our AWS account, create an AGC project named dockstoreAgcTutorialProject that contains a single context, ctx1. Create a file name agc-project.yaml in your working directory and add the following contents:

name: dockstoreAgcTutorialProject
schemaVersion: 1
contexts:
ctx1:
engines:
- type: wdl
engine: cromwell

With the above AGC configuration file, we can now activate AGC on our AWS account. This may take a few minutes. Note that the following AGC commands will require you to provide your AWS credentials, such as an AWS named profile and region. From your working directory, run:

agc account activate

With AGC activated, we can deploy our first context ctx1 :

agc context deploy ctx1

Configure the Dockstore CLI

Once the context has finished creating, all the cloud infrastructure we need to run workflows has been deployed. The only remaining step is to configure the Dockstore CLI so that it knows where to send WES requests. To do this, we need to obtain the WES URL associated with ctx1. To obtain the WES URL, we can use AGC to query for information regarding our deployed context:

agc context describe ctx1

This will print a set of values similar to:

CONTEXT 256 ctx1 false STARTEDOUTPUTLOCATION s3://agc-123456789-us-west-2/project/dockstoreAgcTutorialProject/userid/userName123/context/ctx1WESENDPOINT https://x.execute-api.us-west-2.amazonaws.com/prod/

The value we are interested in is the URL labeled WESENDPOINT. Append the text ga4gh/wes/v1 to the end of the URL and copy it to the Dockstore config file located at ~/.dockstore/config. Your resulting config file should have a section that looks like:

[WES]
url: https://x.execute-api.us-west-2.amazonaws.com/prod/ga4gh/wes/v1
authorization: aws-wes-profile
type: aws

The value labeled authorization in the Dockstore config file is the AWS named profile you want to use to make WES requests.

At this point, the Dockstore CLI is configured to talk to AGC using WES. You can verify your configuration by requesting information regarding the WES server:

dockstore workflow wes service-info

This will print JSON configuration information regarding the deployed WES server, similar to:

{
"workflow_type_versions" : {
"WDL" : {
"workflow_type_version" : [ "1.0", "draft-2" ]
}
},
"supported_wes_versions" : [ "1.0.0" ],
"supported_filesystem_protocols" : null,
"workflow_engine_versions" : null,
"default_workflow_engine_parameters" : null,
"system_state_counts" : null,
"auth_instructions_url" : null,
"contact_info_url" : null,
"tags" : {
"cromwell_service_health" : "True",
"description" : "WES adapter for Cromwell workflow engine service.",
"name" : "remote_cromwell_wes_adapter",
"updated_at" : "2022-03-18T20:13:34.817337Z"
}
}

Launch a workflow

Now that we’ve gone through the hard part of configuring AGC and the Dockstore CLI, we can start launching workflows. We’ll launch a simple Hello World workflow hosted on Dockstore: github.com/dockstore-testing/wes-testing/agc-hello-world:v1.12. Looking at the entry on Dockstore, you can see that this workflow prints “Hello from AGC” to stdout and then exits. To launch the workflow, we just need to provide the unique identifier github.com/dockstore-testing/wes-testing/agc-hello-world:v1.12 to the Dockstore CLI, and that’s it! To make the WES run request, execute:

dockstore workflow wes launch \
--entry github.com/dockstore-testing/wes-testing/agc-hello-world:v1.12

When you launch a workflow using WES, you’ll receive a unique run ID, similar to9977e8b9–9931–48cd-b3c1-c4aad5c9bae7. This ID allows you to request information regarding the workflow’s run status and outputs. To get the logs of the workflow we just launched, run the following (remember to replace the run ID below with the ID unique to your run):

dockstore workflow wes logs \
--id 9977e8b9-9931-48cd-b3c1-c4aad5c9bae7

The logs contain information regarding the run request, such as task logs, execution time, and outputs. The outputs section will contain the workflow outputs, which for us is the String “Hello from AGC”:

{  
.
.
.
"outputs" : {
"id" : "9977e8b9-9931-48cd-b3c1-c4aad5c9bae7",
"outputs" : {
"w.hello.out" : "Hello from AGC"
}
}
}

With that, you’ve setup AGC, the Dockstore CLI, and successfully run your first Dockstore workflow in the cloud! For a more in-depth look at the Dockstore CLI’s support of WES and example execution commands with AGC, see the Dockstore documentation.

Final notes

The GA4GH WES standard provides a set of API operations that simplify the launching process of genomic workflows across different platforms, supporting ease-of-use and interoperability. Using AGC allows you to deploy all the necessary cloud infrastructure for launching genomics workflows using WES, in your own AWS account. All of this paired with the Dockstore CLI makes for an integrated pipeline to take your workflows into the cloud.

--

--