Configure Python to run Dataflow Jobs in Cloud Shell

Arun Kumar
Cloud Techies
2 min readMay 3, 2021

--

Steps

  1. You need to update the requirements.txt to specify the Python modules that are required to deploy the Dataflow jobs using virtualenv in Cloud Shell.
  2. Paste the following list of modules into requirements.txt.
  • This list ensures that the correct Python modules will be installed to allow you to deploy the Python Dataflow jobs.
  • The list also includes the Faker modules and some dependencies that are required where you deploy and test a streaming Dataflow job.
vi requirements.txtapache-beam==2.14.0
google-api-core==1.14.2
google-apitools==0.5.28
google-auth==1.6.3
google-cloud==0.34.0
google-cloud-bigquery==1.17.0
google-cloud-bigtable==0.32.2
google-cloud-core==1.0.0
google-cloud-datastore==1.9.0
google-cloud-pubsub==0.42.1
google-cloud-storage==1.17.0
google-cloud-vision==0.38.0
httplib2==0.12.0
mock==2.0.0
numpy==1.17.0
six==1.12.0
Faker==2.0.0
faker-schema==0.1.4
Cython==0.29.13
fastavro==0.21.24

3. Enter the following command in the Cloud Shell to create a virtualenv environment.

virtualenv -p `which python3.7` dataflow-env

4. Enter the following command in the Cloud Shell to activate the virtualenv environment.

source dataflow-env/bin/activate

5. Enter the following command in Cloud Shell to install the Python modules in your virtualenv environment using the requirements.txt file.

pip install -r /home/$USER/professional-services/examples/dataflow-python-examples/requirements.txt

6. Verify the installed version for Apache Beam.

pip list

The output should indicate that the Apache Beam version is v2.14.0 which is the current latest.

--

--

Cloud Techies
Cloud Techies

Published in Cloud Techies

Onboarding steps, design diagrams, architecture flows, technical solutions and implementations on all major Clouds like AWS, GCP, Azure and details about other important open source tools like Kubernetes, Terraform, Ansible.

Arun Kumar
Arun Kumar

Written by Arun Kumar

Cloud Architect | AWS, GCP, Azure, Python, Kubernetes, Terraform, Ansible

No responses yet