Cloud Techies
Published in

Cloud Techies

Configure Python to run Dataflow Jobs in Cloud Shell

Steps

  1. You need to update the requirements.txt to specify the Python modules that are required to deploy the Dataflow jobs using virtualenv in Cloud Shell.
  2. Paste the following list of modules into requirements.txt.
  • This list ensures that the correct Python modules will be installed to allow you to deploy the Python Dataflow jobs.
  • The list also includes the Faker modules and some dependencies that are required where you deploy and test a streaming Dataflow job.
vi requirements.txtapache-beam==2.14.0
google-api-core==1.14.2
google-apitools==0.5.28
google-auth==1.6.3
google-cloud==0.34.0
google-cloud-bigquery==1.17.0
google-cloud-bigtable==0.32.2
google-cloud-core==1.0.0
google-cloud-datastore==1.9.0
google-cloud-pubsub==0.42.1
google-cloud-storage==1.17.0
google-cloud-vision==0.38.0
httplib2==0.12.0
mock==2.0.0
numpy==1.17.0
six==1.12.0
Faker==2.0.0
faker-schema==0.1.4
Cython==0.29.13
fastavro==0.21.24

3. Enter the following command in the Cloud Shell to create a virtualenv environment.

virtualenv -p `which python3.7` dataflow-env

4. Enter the following command in the Cloud Shell to activate the virtualenv environment.

source dataflow-env/bin/activate

5. Enter the following command in Cloud Shell to install the Python modules in your virtualenv environment using the requirements.txt file.

pip install -r /home/$USER/professional-services/examples/dataflow-python-examples/requirements.txt

6. Verify the installed version for Apache Beam.

pip list

The output should indicate that the Apache Beam version is v2.14.0 which is the current latest.

--

--

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Arun Kumar

Cloud Architect | AWS, GCP, Azure, Python, Kubernetes, Terraform, Ansible