How to quickly experiment with Dataflow (Apache Beam Python)

Lak Lakshmanan
Nov 1, 2016 · 2 min read

One of my colleagues showed me this trick to quickly experiment with Cloud Dataflow/Apache Beam, and it’s already saved me a couple of hours. [Dataflow is Google’s autoscaling, serverless way of processing both batch and streaming data. It runs Apache Beam pipelines. If you haven’t used it, you should try it out.]

To try out some bit of Python Dataflow code, this is what I would do: I would create a Pipeline, read some data from a CSV file, transform it with the code I was trying out, write out the result to a text file and then look at it. Very, very sssslow process.

The cool new way takes advantage of the Python REPL (the command-line interpreter) and the fact that Python lists can function as a Dataflow source.

If necessary, install the Apache Beam package on your machine:

$pip install 'apache-beam[gcp]'

Start the Python interpreter on the command-line:

$ python

Import the Apache Beam package:

>>> import apache_beam as beam

Now, you are ready to roll. You can create a example list and pass it in to a transform:

>>> [3, 8, 12] | beam.Map(lambda x : 3*x)[9, 24, 36]

How cool is that? No pipelines, no input/output files. Just a simple list piped to the Transform code you want to try out.

Here’s an example of trying something on a key-value pair (represented as a 2-tuple in Python Dataflow):

>>> [(‘Jan’,3), (‘Jan’,8), (‘Feb’,12)] | beam.GroupByKey()[(‘Jan’, [3, 8]), (‘Feb’, [12])]

You can keep appending transforms:

>>> [(‘Jan’,3), (‘Jan’,8), (‘Feb’,12)] | beam.GroupByKey() | beam.Map(lambda (mon,days) : (mon,len(days)))[(‘Jan’, 2), (‘Feb’, 1)]

Hope this trick saves you as much time as it saved me.

Happy coding!

Google Cloud - Community

A collection of technical articles published or curated by…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store