How to quickly experiment with Dataflow (Apache Beam Python)

Lak Lakshmanan
Nov 1, 2016 · 2 min read

One of my colleagues showed me this trick to quickly experiment with Cloud Dataflow/Apache Beam, and it’s already saved me a couple of hours. [Dataflow is Google’s autoscaling, serverless way of processing both batch and streaming data. It runs Apache Beam pipelines. If you haven’t used it, you should try it out.]

To try out some bit of Python Dataflow code, this is what I would do: I would create a Pipeline, read some data from a CSV file, transform it with the code I was trying out, write out the result to a text file and then look at it. Very, very sssslow process.

The cool new way takes advantage of the Python REPL (the command-line interpreter) and the fact that Python lists can function as a Dataflow source.

If necessary, install the Apache Beam package on your machine:

$pip install 'apache-beam[gcp]'

Start the Python interpreter on the command-line:

$ python

Import the Apache Beam package:

>>> import apache_beam as beam

Now, you are ready to roll. You can create a example list and pass it in to a transform:

>>> [3, 8, 12] | beam.Map(lambda x : 3*x)[9, 24, 36]

How cool is that? No pipelines, no input/output files. Just a simple list piped to the Transform code you want to try out.

Here’s an example of trying something on a key-value pair (represented as a 2-tuple in Python Dataflow):

>>> [(‘Jan’,3), (‘Jan’,8), (‘Feb’,12)] | beam.GroupByKey()[(‘Jan’, [3, 8]), (‘Feb’, [12])]

You can keep appending transforms:

>>> [(‘Jan’,3), (‘Jan’,8), (‘Feb’,12)] | beam.GroupByKey() | beam.Map(lambda (mon,days) : (mon,len(days)))[(‘Jan’, 2), (‘Feb’, 1)]

Hope this trick saves you as much time as it saved me.

Happy coding!

Google Cloud - Community

Google Cloud community articles and blogs

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store