Did you know that a new feature was recently rolled out for Apache Beam that allows you to execute SQL directly in your pipeline? Well, don’t worry folks because I missed it too. It’s called Beam SQL, and it looks pretty darn interesting.

In this article, I’ll dive into this new feature of Beam, and see how it works by using a pipeline to read a data file from GCS, transform it, and then perform a basic calculation on the values contained in the file. Far from a complex pipeline I agree, but you’ve got to start somewhere, right!


Graham Polley

Cause trouble on that cloud thing that everyone is talking about. I like BigQuery. Work @weareservian and tweet nonsense @polleyg. Moving blog to polleyg.dev

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store