Running a python script using Docker, Pycharm & Google Cloud Platform
Repo backing this article:
Lets say you have been developing a python script which runs well on your local environment. Up to now you may have a been using venv to manage separate package installations for different projects. This generally works well when running on a single machine.
You then get to a point where you need to run your code in some cloud — venv could still work, but you are inevitably going to run into some OS environment inconsistencies. This is where Docker comes in — there is a ton of online stuff on Docker and why this is the case — this post will not go into the details.
In this post I will therefore:
- Show how you could setup your local environment using Pycharm which integrates well with Docker
- How you could use a few of Google Cloud’s product to facilitate, in my opinion, a very efficient CI/CD.
- And then once your cloud architecture is setup, how you could run your python script on the cloud.
Initial Pycharm Setup
This section will outline how to setup your local environment.
Ensure Docker is running on your pc/mac. If not install here — you need to login / create a docker account to download.
Within your codebase create a Dockerfile file with the required commands to create a Docker image.
Create the image using the ‘ docker build -t myimagename .’ Remember the ‘.’ at the end of this command. Running the `docker image ls` command will now list the image:
Since docker is image is now ready on your local machine, you are able to use the image as the Python Interpreter.
Our local environment is now setup and any code snippets will be run within the Docker container when using the Pycharm run command:
Automated Builds for Continuous Integration
I will use Google Cloud Build for a simple CI / CD pipeline. This will essentially ensure that in order to update our infrastrucure / service we only need to push commits to our repo.
We will be using Cloud Build Triggers to automatically build containers based on a source code change. Here is an example of a Trigger:
To use Triggers, we have setup a private git repo using Google Cloud Repository:
The above Trigger needs a cloudbuild.yaml file in your codebase.
Our Google Cloud Build Trigger is setup to create an image automatically whenever it detects any repo changes.
Note that the dynamic variables $PROJECT, $REPO_NAME and $BRANCH_NAME will automatically create an image which will match your repo name and branch. This is Cool. Here is a list of all the dynamic variables.
Now that we have our Trigger is setup to respond to any change in repo, Google Cloud Build will automatically trigger a build on repo changes. Here is an example of a build initiated from a commit push to the ‘r1.2’ branch:
Running the Job
Now that we have a Docker image ready on Google Container Repository which ‘syncs’ with Google Source Repository we are ready to launch a Compute Instance. This is super easy since GCE has default support to run Docker containers through its ‘Container Optimised’ images.
Defining the Job
We have a simple script which takes say one parameter, ‘my-parameter’, and runs a set of python code in a main.py file. This is taken care of in the container by the following line the Dockerfile:
CMD python main.py --my-parameter $MY_PARAMETER
This uses the shell version for of the CMD
Deploy Container on Google Compute Engine
The simplest way to launch our new container is to use the ‘Deploy to GCE’ option within the Container Registry
Deploy to GCE will pre-populate Creating an instance as follows:
In the ‘Advanced container options’ one is able to set the ‘Environment variables’
In this way we could pass parameters when images are created into our python code. You’ll remember that the Dockerfile had the following two entries:
CMD python main.py --my-parameter $MY_PARAMETER
With the above screen still open you’ll notice right at the bottom — ‘Equivalent REST or command line’ . This is super useful. Looking at the gcloud command line we have an easy way to launch our job:
gcloud beta compute --project=sandbox-docker instances create-with-container instance-1 \
A Few Useful Features of Docker Images Running on GCE
Accessing local storage
When running a particular job we often create local temporary files. In a normal Virtual Environment (VM) we simply had a folder say /temp which we could write and read from.
In a Docker container environment you are running from root user and have all the OS environment of the image you are running in.
With a normal VM environment we installed the Logging Agent as per the Google Online Documentation which pushes all logs to Stackdriver.
In a Docker container environment running on GCE this is already setup. Here is a link with more information on how this works.
One typically would use a custom service account when running jobs on a VM. This will allow one to follow the principle of Least Privilege. Setting this up in a Docker environment is no different to setting it up on a standard VM.
Application Default Credentials (ADC)
One cool thing about using Google Client libraries is that there is an option to use Google Managed credentials when invoking a library. This is what they refer to as Application Default Credentials (ADC). These credentials are managed in that Google takes care of rotating credentials. We use ADC with many of the python libraries as per the online documentation — as long as our code runs on Google Cloud, we don’t have to think about managing credentials. This allows us to focus on managing IAM policies at service account level to ensure that we always apply the principle of least privilege when setting policies and attaching service accounts to say GCE instances.
ADC works out of the box with Docker containers so again no changes required when moving from running code directly on a VM to running it in a Docker container on a GCE VM.
I love it when products work well together. I’m not endorsed / employed by Google, but I really like what it going on with regards to the tighter integration between their products.