Running workflows at Freetrade, serverless!

Alexandru Rosianu
Freetrade Blog
Published in
3 min readFeb 12, 2020

One of the challenges we have at Freetrade is running workflows regularly, automatically, and reliably.

These workflows keep the price charts in your app up to date, generate your monthly statements, reconcile transactions and much more, so it’s important that they run on time.

To make sure that happens, our serverless-driven approach led us to Cloud Composer, a hosted and fully managed version of the open-source workflow management platform Airflow. We picked Cloud Composer so that Google could focus on running Airflow, while we focus on building Freetrade!

Side note, to get some acronyms out of the way: Airflow calls the workflows it runs “DAGs,” which stands for “directed acyclic graphs.” Internally, we also call them “pipelines.”

This hasn’t been without troubles. A problem we have had with Airflow is that some pipelines occasionally take a long time to complete. Some have even taken 30 minutes longer than usual.

Screenshot of our stock universe pipeline that syncs the stock information in our database with our CMS (content management system). You can see we had a spike that momentarily increased the completion time from the usual 4–7 minutes to more than 15 minutes.

We identified this as a resource management problem. Some of the nodes that run our pipelines were getting overcrowded as they were assigned more pipelines than others.

After searching through the documentation provided by Airflow and Cloud Composer, we created our own mental models to help us reason about resource management when using Airflow.

One of our diagrams that helps us understand Airflow and Cloud Composer configuration.

One of the diagrams we have created shows how Airflow’s parallelism and Cloud Composer’s worker_concurrency configuration parameters relate to the resources available in our Cloud Composer environment.

This diagram summarises the following lessons:

  • We get the best results when worker_concurrency is set equal to the number of CPUs available on each node, minus 1 (to leave room for other processes, such as Airflow’s scheduler).
  • parallelism is simply a sum of the above calculation across all our nodes, so that we never start more pipelines than the number of CPUs we have available.

These settings are, of course, specific to our requirements and the types of pipelines we run. Some of them are network-bound, so they don’t need a lot of processing power. Others do a lot of parsing and data processing and are thus limited by the processing power available.

Google’s Scale your Composer environment together with your business blog post along with its attached spreadsheet were exceptionally useful in helping us understand these parameters.

Are you an Airflow or Cloud Composer wizard? Do you have a passion for engineering? Do you want to be part of a startup that makes investing easy for everyone?

We’re hiring! See all the roles on our Careers page.

Freetrade does not provide investment advice and individual investors should make their own decisions or seek independent advice. The value of investments can go up as well as down and you may receive back less than your original investment. Tax laws are subject to change and may vary in how they apply depending on the circumstances.

Freetrade is a trading name of Freetrade Limited, which is a member firm of the London Stock Exchange and is authorised and regulated by the Financial Conduct Authority. Registered in England and Wales (no. 09797821).

--

--