Google Cloud Deployment Manager

“The Missing Tutorials” series

Decided to revisit an earlier post Deploying Docker Engine in swarm mode to GCP to try to take advantage of Google’s Runtime Config. I’d like to coordinate the generation of multiple workers by having them obtain the swarm join-token from a Runtime Config variable.

This gives me the opportunity to expand my knowledge of Deployment Manager as I’ll need to use Service Management service to enable APIs, IAM to generate a Service Account (with enhanced scope) for the docker node VMs, Cloud Resource Manager to revise the project’s policy and, of course, Runtime Config to create and manage the tokens.

This post will be a stream-of-writing lessons learned as I encounter problems and solve them.

Python not JINA

The Deployment Manager service supports Python scripts and JINJA templates and it supports mixing and matching both. I discourage you from using JINJA templates. It is likely that you’ll need to embrace the superset of functionality provided by Python and it’s likely better to have everything in Python.

Supported Resource Types

Deployment Manager (DM) scripts define an intended state of GCP resources. When you need to use DM with an unfamiliar resource, a good place to start is this page of supported resource types:

https://cloud.google.com/deployment-manager/docs/configuration/supported-resource-types

Alternatively, you can enumerate the list from the command-line:

gcloud deployment-manager types list

There’s a 1:1 mapping services, service endpoints and their resources (types) but it‘s not always an obvious mapping. It’s also not bijective: not all service resource types are mapped to Deployment Manager resource types; the service resource types are a superset. Thus, you can’t use Deployment Manager to manage all GCP’s resources.

The Compute (Engine) v1 service is comprehensively and consistently mapped to Deployment Manager types:

gcloud deployment-manager types list \
--filter="name ~ ^compute\.v1\."
NAME
compute.v1.regionInstanceGroupManager
compute.v1.firewall
compute.v1.router
compute.v1.regionBackendService
compute.v1.instanceGroupManager
compute.v1.sslCertificate
compute.v1.disk
compute.v1.image
compute.v1.targetInstance
compute.v1.healthCheck
compute.v1.subnetwork
compute.v1.autoscaler
compute.v1.targetSslProxy
compute.v1.route
compute.v1.httpHealthCheck
compute.v1.vpnTunnel
compute.v1.instanceGroup
compute.v1.urlMap
compute.v1.regionAutoscaler
compute.v1.httpsHealthCheck
compute.v1.forwardingRule
compute.v1.regionInstanceGroup
compute.v1.targetPool
compute.v1.targetHttpProxy
compute.v1.address
compute.v1.globalAddress
compute.v1.targetHttpsProxy
compute.v1.globalForwardingRule
compute.v1.instanceTemplate
compute.v1.backendService
compute.v1.network
compute.v1.targetVpnGateway
compute.v1.instance

The Deployment Manager v2 service has a a resource type Manifests but this is not accessible as a supported Deployment Manager resource type:

gcloud deployment-manager types list --filter="name ~ ^deploymentmanager\.v2\."
No types were found for your project!

The Deployment Manager v2 endpoint is:

https://www.googleapis.com/deploymentmanager/v2/projects

However the IAM service endpoint is:

https://iam.googleapis.com

And the service defines some of its resource types with projects or organizations prefixes, e.g. projects.serviceAccounts and projects.serviceAccounts.keys:

https://cloud.google.com/iam/reference/rest/#collection-v1projectsserviceaccounts
https://cloud.google.com/iam/reference/rest/v1/projects.serviceAccounts.keys

But these map to Deployment Manager resource types without the “projects” prefix:

gcloud deployment-manager types list --filter="name ~ ^iam\.v1\."
iam.v1.serviceAccounts.key
iam.v1.serviceAccount

Creating Service Accounts

The Deployment Manager GitHub examples include the representation of service accounts (types) but this is not documented elsewhere and I will use it as an example of encountering a new resource type. In this case, we may create a service account without an accompanying key because the service account will be used by a Compute Engine VM.

Using the Cloud SDK command-line, the equivalent commands would be:

PROJECT=[[YOUR-PROJECT-ID]]
INSTANCE=[[YOUR-INSTANCE]]
NAME=[[YOUR-SERVICE_ACCOUNT_NAME]]
SERVICE_ACCOUNT=${NAME}@${PROJECT}.iam.gserviceaccount.com
gcloud iam service-accounts create $NAME --project=$PROJECT
gcloud compute instances set-service-account $INSTANCE \
--service-account=${SERVICE_ACCOUNT} \
--scopes=...
--project=$PROJECT \
--zone=$ZONE

The documentation for [projects.]serviceAccounts summarizes (some of) the properties associated with this resource. These properties represent an instantiated resource:

{
"name": string,
"projectId": string,
"uniqueId": string,
"email": string,
"displayName": string,
"etag": string,
"oauth2ClientId": string,
}

But, to create a Service Account, it’s necessary to provide an accoundId and a displayName. How do I know this? The create method document this:

https://cloud.google.com/iam/reference/rest/v1/projects.serviceAccounts/create

And, trusty API Explorer proves it:

https://developers.google.com/apis-explorer/#search/iam/iam/v1/iam.projects.serviceAccounts.create
iam.projects.serviceAccounts.create

Deployment Manager (appears to) use(s) a flat set of properties and so the API method’s hierarchy isn’t preserved:

{
accountId,
serviceAccount: {
displayName
}
}

and becomes:

{
'name': $NAME,
'type': 'iam.v1.serviceAccount',
'metadata': {
'dependsOn': [
'cloudresourcemanager',
'iam'
]
},
'properties':{
'accountId': $NAME,
'displayName': $NAME
}
}

Enabling Services

The IAM service is not enabled by default in projects. It is likely that, if you try to create a Service Account as described above, DM will balk with an error:

'{"ResourceType":"iam.v1.serviceAccount","ResourceErrorCode":"403","ResourceErrorMessage":{"code":403,"message":"Google Identity and Access Management (IAM) API has not been used in project ... before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/iam.googleapis.com/overview?project=... then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.","status":"PERMISSION_DENIED","details":...

We need to use the Service Management service to enable the IAM API and then we need to add a dependency from Deployment Manager serviceAccount type that we created previously to depend on IAM being enabled.

You won’t find a Deployment Manager type for the Service Management service:

gcloud deployment-manager types list \
--filter="name ~ ^servicemanagement"
No types were found for your project!

Once again the GitHub samples provide the solution. I found this inadvertently when I was searching the way to create Service Accounts. It’s not obvious though and there’s a singular result for this method:

{
'name': ...,
'type': 'deploymentmanager.v2.virtual.enableService',
'metadata': {
'dependsOn': ...
},
'properties': {
'consumerId': 'project:' + project_id,
'serviceName': ...
}
}

This code works! I don’t know why it works but it does. I don’t understand the reference to deploymentmanager.v2.virtual.X but I am able to make sense of the properties. Once again, API Explorer helps:

https://developers.google.com/apis-explorer/#search/servicemanagement/servicemanagement/v1/servicemanagement.services.enable
servicemanagement.services.enable

Flattening the API method’s required properties (serviceName, consumerId) yields the code provided in the Google sample. NB I’m also use context.env to grab the ‘project’ (== GCP Project ID) from the runtime environment as this is required for the value of the consumerId. I turned the code into a function and, in this case, I’m calling the function and passing $API == “iam” to enable iam.googleapis.com

{
'name': $API,
'type': 'deploymentmanager.v2.virtual.enableService',
'properties': {
'consumerId': 'project:' + context.env['project'],
'serviceName': $API + '.googleapis.com'
}

This is equivalent to the following Cloud SDK command:

gcloud service-management enable iam[.googleapis.com] \
--project=$PROJECT

Mutating (IAM) Policies

Okay “changing policies”, “updating policies” but “mutating” just sounds so much more fun! I’m waiting on some sample code on the GitHub site to show how to perform this. It’s complicated by the way this service works.

Customarily, the way to mutate an IAM policy is to:

  • Get the current policy
  • Mutate it
  • Put the revised policy back to the service

The policy document not only includes the current list of policy bindings for the e.g. project but it also includes an etag which is used as a concurrency mechanism. It’s effectively a hash of the policy. If, when you put the policy back to the service, the hash of the service’s stored policy differs from the etag in the document you put to the service, the service knows that the policy was revised and that your changes aren’t applied and will be rejected.

This is a challenging mechanism to represent in Deployment Manager and the GitHub samples include a helper function that merges a service account into a policy. I understand from the Deployment Manager team that an alternative solution should be available soon.

The Cloud SDK includes a convenience method add-iam-policy that does the work behind the scenes:

gcloud projects add-iam-policy-binding $PROJECT \
--member=serviceAccount:${SERVICE_ACCOUNT} \
--role=roles/editor

I‘m told by the Deployment Manager team that this functionality is imminent and in the form of a new capability called ‘actions’. I’ll update this content when the feature’s released.

Runtime Config(urator)

Deployment Manager includes a feature called Runtime Config(urator). It provides functionality that particularly relevant to Deployment Manager but, it is in fact, a standalone service. I described a way to use Runtime Config in combination with Global Scope as a way to pass configuration data to Cloud Functions.

Runtime Config provides a key improvement to deploying Docker swarm. When a swarm is initialized, the 1st (Docker Engine) node generates a manager token and a worker token that must be provided by other nodes to prove themselves when joining the cluster.

1st (genesis) node:

sudo docker swarm init
Swarm initialized: current node (73gqde43chycmq9s7f93klmmf) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-243cu5kol7pfie85svs8cnvsmbypm2gp8mhe29b12izyj5cr92-e2r75b6yxdx8mk3rrwjf4kfzi 10.138.0.2:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

It’s easiest to obtain the token(s) from this genesis node by requerying it and, as a hint to how we’ll proceed, assigning the value to an environment variable:

WORKER_TOKEN=$(sudo docker swarm join-token worker --quiet)
MANAGER_TOKEN=$(sudo docker swarm join-token manager --quiet)

These commands return the token(s) only.

With these tokens, from other (Docker Engine) nodes (on other VMs), we could then:

sudo docker swarm join --token ${WORKER_TOKEN} swarm-master:2377
sudo docker swarm join --token ${MANAGER_TOKEN} swarm-master:2377

So, two outstanding questions, how do we:

  • make the tokens available to the other nodes?
  • block creation of the other swarm nodes on the tokens’ availability?

This is what Runtime Config provides us. Runtime Config is scoped to a single project and permits further scoping through namspaces. Let’s firstly create a namespace called ‘swarm’ for the Docker swarm mode tokens:

gcloud beta runtime-config configs create swarm

We can then create variables arbitrarily within this ‘swarm’ namespace. I chose to create a variable called ‘worker’ and another called ‘manager’ but to put these in a hierarchy under ‘token’. The ‘token’ prefix is redundant but..

gcloud beta runtime-config configs \
variables set /token/worker ${WORKER_TOKEN} \
--config-name swarm \
--is-text
gcloud beta runtime-config configs \
variables set /token/manager ${MANAGER_TOKEN} \
--config-name swarm \
--is-text

I’m using the “ — is-text” flag. The tokens are plaintext (alphanumeric) and this saves having to base64 decode the values when retrieved.

Runtime Config provides mechanisms for watching and waiting on variables but it was not immediately clear to me that either of these provides the functionality needed here. Instead — and somewhat unhappily — I decided to ‘hack’ a solution (please comment-ping me with improvements).

The Deployment Manager script creates token/manager and token/worker variables with a $DUMMY setting *before* it creates any VMs (obviously including the genesis node). The VMs all know that, if a value from the variables if $DUMMY, the genesis node is not yet ready and they block and retry after one minute.

Advantage(s)

  • Simple

Disadvantages

  • Less elegant
  • Potentially infinite blocking
  • Requires shared knowledge of the $DUMMY value
def GenerateRuntimeConfigConfig(context, name):
"""Generate a Runtime-Config Config 'name'"""
return {
'name': 'config-name-' + name,
'type': 'runtimeconfig.v1beta1.config',
'metadata': {
'dependsOn': [
'runtimeconfig',
],
},
'properties': {
'config': name,
}
}
def GenerateRuntimeConfigVariable(context, name, variable, default):
"""Generate Runtime-Config 'variable' with 'default' text value"""
project_id = context.env['project']
config_name = 'config-name-' + name
return {
'name': 'variable-' + variable,
'type': 'runtimeconfig.v1beta1.variable',
'metadata': {
'dependsOn': [
config_name,
]
},
'properties': {
'parent': '$(ref.'+ config_name +'.name)',
'variable': variable,
'text': default,
},
}

NB The Runtime Config service must be enabled and this is what’s checked in the dependsOn when the Config is created.

Then, in the startup script for the 1st (genesis) node, after the swarm init, the worker and manager tokens are requested and are used to replace the $DUMMY values:

sudo docker swarm init
WORKER_TOKEN=$(sudo docker swarm join-token worker --quiet)
MANAGER_TOKEN=$(sudo docker swarm join-token manager --quiet)
gcloud beta runtime-config configs variables set \
/token/worker ${WORKER_TOKEN} \
--config-name=swarm \
--is-text
gcloud beta runtime-config configs variables set \
/token/manager ${MANAGER_TOKEN} \
--config-name=swarm \
--is-text

So that the startup script for a worker can pull the value from the Runtime Config variable for the token. It may be $DUMMY but, when it’s isn’t, it will be the correct worker token value:

WORKER_TOKEN=$(gcloud beta runtime-config configs variables get-value /token/worker --config-name=swarm)
while [ "${WORKER_TOKEN}" == "DUMMY" ]
do
sleep 60s
done
sudo docker swarm join --token ${WORKER_TOKEN} swarm-master:2377

And, clearly, the startup script for a manager flips the variables:

MANAGER_TOKEN=$(gcloud beta runtime-config configs variables get-value /token/manager --config-name=swarm)
while [ "${MANAGER_TOKEN}" == "DUMMY" ]
do
sleep 60s
done
sudo docker swarm join --token ${MANAGER_TOKEN} swarm-master:2377

Let’s test it:

gcloud deployment-manager deployments create docker-swarm \
--config=docker_swarm.yaml \
--project=$PROJECT
The fingerprint of the deployment is ...
Waiting for update [operation-...]...done.
Update operation operation-... completed successfully.
cloudresourcemanager    deploymentmanager.v2...          COMPLETED
$PROJECT cloudresourcemanager.v1.project COMPLETED
iam deploymentmanager.v2... COMPLETED
runtimeconfig deploymentmanager.v2... COMPLETED
swarm runtimeconfig.v1beta1.config COMPLETED
swarm-manager-mig compute.v1.instanceGroupManager COMPLETED
swarm-manager-template compute.v1.instanceTemplate COMPLETED
swarm-master compute.v1.instance COMPLETED
swarm-robot iam.v1.serviceAccount COMPLETED
swarm-worker-mig compute.v1.instanceGroupManagers COMPLETED
swarm-worker-template compute.v1.instanceTemplate COMPLETED
variable-manager runtimeconfig.v1beta1.variable COMPLETED
variable-worker runtimeconfig.v1beta1.variable COMPLETED

And, ssh’ing into swarm-master:

sudo docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
jt7vcsgb2jnst0z33aolmzs0t * swarm-master Ready Active
s3su15bvothaswr17l6fugbml swarm-master-0pj9 Ready Active
jsi7dsfbp9zjcreihrzhr5g7q swarm-master-lb6q Ready Active
ftsrgxyfqeiu1l9o5b9nhs45q swarm-master-tjpm Ready Active
w5mq06h0gvirfldd0ezm1s1j7 swarm-worker-4p1p Ready Active
b9279nr43pjhog2d4pl9crjt4 swarm-worker-r1mx Ready Active
krik9fnw8p4w0jrca3qcicw46 swarm-worker-znmt Ready Active

NB I’ve hacked the output to make it more presentable here: swarm-master is annotated as “Leader” and the three masters are all marked “Reachable”

Conclusions

Deployment Manager is powerful but would benefit from more comprehensive documentation for noobs like me. The service is well-designed but it’s not always intuitive (consistent).