AntVoice Tech
Published in

AntVoice Tech

How we are streaming thousands of rows per second into BigQuery — Part II: Google Storage loading

Photo by Kimi lee on Unsplash

The cost items

The cost came from 3 Google Cloud components:

  1. The PubSub throughput
  2. The Dataflow VMs
  3. The BQ streaming insert

The basic idea

We quickly looked into the possibility to load data into BigQuery from Google Cloud Storage files (abbreviated GCS from now on). It is the only other way to load data into BQ if we want to avoid streaming costs.

  1. Don’t impact our running services
  2. Don’t rely on a message bus system requiring to scale up when the demands grow and which was expensive
  3. Write data quickly enough for the data to be readable on our reporting UI. We need the data to be loaded as often as 15 minutes to have the freshest data possible.

How we did it

To find our solution, we began from the end of the workflow. Indeed we knew we had to use the file batch loading into BQ from GCS.

  1. Listen to directory modification events (file created, file moved) using the library FsNotify.
  2. Transfer those files to GCS

Impacts

The cost reduction was between 5 to 10 times versus the dataflow costs depending on the data throughput. We are currently using a mix of both solutions where the voluminous data is going through this new solution, and where critical data still go through the old one.

  1. The data is streaming nearly instantly into BQ which allow us to use it in monitoring and reporting tools. And some services also need the data to be available as soon as possible
  2. The dataflow solution is lossless, whereas going through file system streaming and copying locally and then on GCS is causing a minimal loss (mainly due to pod restarts). But this loss, which is constantly monitored, was acceptable regarding the reduced cost and the data criticality. Nowadays we may be losing less than 0.01%.

Conclusion

Using this new way of inserting data in BQ is saving us a lot of money. It allowed us to scale up peacefully, and to accept much more incoming load without seeing our bills skyrocket.

--

--

AntVoice helps brands and merchants to identify and target new clients through weak signals analysis and artificial intelligence.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store