Building a real time quant trading engine on Google Cloud Dataflow and Apache Beam

A practical use case of GCP Dataflow

Lei He
Lei He
Aug 4, 2018 · 6 min read

  1. CoGroupByKey — join two or more key/value PCollection by the same key. In our exercise we use CoGroupByKey to group together tick data by stock symbol in assigned pair.
  2. Filter — filter an input PCollection using a lambda or function
  3. Map — 1 to 1 transforms an input PCollection to an output for each data points
  4. WindowInto — for unbounded tick stream data, we use a sliding window to group together 10 minutes worth of tick data every 1 minute. Correlation is calculated per window worth of data
python -m correlated_trading.trading_pipe \
--project $PROJECT \
--runner DataflowRunner \
--staging_location $BUCKET/staging \
--temp_location $BUCKET/temp \
--input_mode stream \--input_topic tick_data_input \
--output_topic trading_signal

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Lei He

Written by

Lei He

Software engineer, data infrastructure, blog @CloudboxLabs.com

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.