Google Cloud Deployment Manager
“The Missing Tutorials” series
Decided to revisit an earlier post Deploying Docker Engine in swarm mode to GCP to try to take advantage of Google’s Runtime Config. I’d like to coordinate the generation of multiple workers by having them obtain the swarm join-token from a Runtime Config variable.
This gives me the opportunity to expand my knowledge of Deployment Manager as I’ll need to use Service Management service to enable APIs, IAM to generate a Service Account (with enhanced scope) for the docker node VMs, Cloud Resource Manager to revise the project’s policy and, of course, Runtime Config to create and manage the tokens.
This post will be a stream-of-writing lessons learned as I encounter problems and solve them.
Python not JINA
The Deployment Manager service supports Python scripts and JINJA templates and it supports mixing and matching both. I discourage you from using JINJA templates. It is likely that you’ll need to embrace the superset of functionality provided by Python and it’s likely better to have everything in Python.
Supported Resource Types
Deployment Manager (DM) scripts define an intended state of GCP resources. When you need to use DM with an unfamiliar resource, a good place to start is this page of supported resource types:
https://cloud.google.com/deployment-manager/docs/configuration/supported-resource-types
Alternatively, you can enumerate the list from the command-line:
gcloud deployment-manager types list
There’s a 1:1 mapping services, service endpoints and their resources (types) but it‘s not always an obvious mapping. It’s also not bijective: not all service resource types are mapped to Deployment Manager resource types; the service resource types are a superset. Thus, you can’t use Deployment Manager to manage all GCP’s resources.
The Compute (Engine) v1 service is comprehensively and consistently mapped to Deployment Manager types:
gcloud deployment-manager types list \
--filter="name ~ ^compute\.v1\."NAME
compute.v1.regionInstanceGroupManager
compute.v1.firewall
compute.v1.router
compute.v1.regionBackendService
compute.v1.instanceGroupManager
compute.v1.sslCertificate
compute.v1.disk
compute.v1.image
compute.v1.targetInstance
compute.v1.healthCheck
compute.v1.subnetwork
compute.v1.autoscaler
compute.v1.targetSslProxy
compute.v1.route
compute.v1.httpHealthCheck
compute.v1.vpnTunnel
compute.v1.instanceGroup
compute.v1.urlMap
compute.v1.regionAutoscaler
compute.v1.httpsHealthCheck
compute.v1.forwardingRule
compute.v1.regionInstanceGroup
compute.v1.targetPool
compute.v1.targetHttpProxy
compute.v1.address
compute.v1.globalAddress
compute.v1.targetHttpsProxy
compute.v1.globalForwardingRule
compute.v1.instanceTemplate
compute.v1.backendService
compute.v1.network
compute.v1.targetVpnGateway
compute.v1.instance
The Deployment Manager v2 service has a a resource type Manifests but this is not accessible as a supported Deployment Manager resource type:
gcloud deployment-manager types list --filter="name ~ ^deploymentmanager\.v2\."No types were found for your project!
The Deployment Manager v2 endpoint is:
https://www.googleapis.com/deploymentmanager/v2/projects
However the IAM service endpoint is:
https://iam.googleapis.com
And the service defines some of its resource types with projects or organizations prefixes, e.g. projects.serviceAccounts and projects.serviceAccounts.keys:
https://cloud.google.com/iam/reference/rest/#collection-v1projectsserviceaccounts
https://cloud.google.com/iam/reference/rest/v1/projects.serviceAccounts.keys
But these map to Deployment Manager resource types without the “projects” prefix:
gcloud deployment-manager types list --filter="name ~ ^iam\.v1\."iam.v1.serviceAccounts.key
iam.v1.serviceAccount
Creating Service Accounts
The Deployment Manager GitHub examples include the representation of service accounts (types) but this is not documented elsewhere and I will use it as an example of encountering a new resource type. In this case, we may create a service account without an accompanying key because the service account will be used by a Compute Engine VM.
Using the Cloud SDK command-line, the equivalent commands would be:
PROJECT=[[YOUR-PROJECT-ID]]
INSTANCE=[[YOUR-INSTANCE]]
NAME=[[YOUR-SERVICE_ACCOUNT_NAME]]SERVICE_ACCOUNT=${NAME}@${PROJECT}.iam.gserviceaccount.comgcloud iam service-accounts create $NAME --project=$PROJECTgcloud compute instances set-service-account $INSTANCE \
--service-account=${SERVICE_ACCOUNT} \
--scopes=...
--project=$PROJECT \
--zone=$ZONE
The documentation for [projects.]serviceAccounts summarizes (some of) the properties associated with this resource. These properties represent an instantiated resource:
{
"name": string,
"projectId": string,
"uniqueId": string,
"email": string,
"displayName": string,
"etag": string,
"oauth2ClientId": string,
}
But, to create a Service Account, it’s necessary to provide an accoundId and a displayName. How do I know this? The create method document this:
https://cloud.google.com/iam/reference/rest/v1/projects.serviceAccounts/create
And, trusty API Explorer proves it:
https://developers.google.com/apis-explorer/#search/iam/iam/v1/iam.projects.serviceAccounts.create
Deployment Manager (appears to) use(s) a flat set of properties and so the API method’s hierarchy isn’t preserved:
{
accountId,
serviceAccount: {
displayName
}
}
and becomes:
{
'name': $NAME,
'type': 'iam.v1.serviceAccount',
'metadata': {
'dependsOn': [
'cloudresourcemanager',
'iam'
]
},
'properties':{
'accountId': $NAME,
'displayName': $NAME
}
}
Enabling Services
The IAM service is not enabled by default in projects. It is likely that, if you try to create a Service Account as described above, DM will balk with an error:
'{"ResourceType":"iam.v1.serviceAccount","ResourceErrorCode":"403","ResourceErrorMessage":{"code":403,"message":"Google Identity and Access Management (IAM) API has not been used in project ... before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/iam.googleapis.com/overview?project=... then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.","status":"PERMISSION_DENIED","details":...
We need to use the Service Management service to enable the IAM API and then we need to add a dependency from Deployment Manager serviceAccount type that we created previously to depend on IAM being enabled.
You won’t find a Deployment Manager type for the Service Management service:
gcloud deployment-manager types list \
--filter="name ~ ^servicemanagement"No types were found for your project!
Once again the GitHub samples provide the solution. I found this inadvertently when I was searching the way to create Service Accounts. It’s not obvious though and there’s a singular result for this method:
{
'name': ...,
'type': 'deploymentmanager.v2.virtual.enableService',
'metadata': {
'dependsOn': ...
},
'properties': {
'consumerId': 'project:' + project_id,
'serviceName': ...
}
}
This code works! I don’t know why it works but it does. I don’t understand the reference to deploymentmanager.v2.virtual.X but I am able to make sense of the properties. Once again, API Explorer helps:
https://developers.google.com/apis-explorer/#search/servicemanagement/servicemanagement/v1/servicemanagement.services.enable
Flattening the API method’s required properties (serviceName, consumerId) yields the code provided in the Google sample. NB I’m also use context.env to grab the ‘project’ (== GCP Project ID) from the runtime environment as this is required for the value of the consumerId. I turned the code into a function and, in this case, I’m calling the function and passing $API == “iam” to enable iam.googleapis.com
{
'name': $API,
'type': 'deploymentmanager.v2.virtual.enableService',
'properties': {
'consumerId': 'project:' + context.env['project'],
'serviceName': $API + '.googleapis.com'
}
This is equivalent to the following Cloud SDK command:
gcloud service-management enable iam[.googleapis.com] \
--project=$PROJECT
Mutating (IAM) Policies
Okay “changing policies”, “updating policies” but “mutating” just sounds so much more fun! I’m waiting on some sample code on the GitHub site to show how to perform this. It’s complicated by the way this service works.
Customarily, the way to mutate an IAM policy is to:
- Get the current policy
- Mutate it
- Put the revised policy back to the service
The policy document not only includes the current list of policy bindings for the e.g. project but it also includes an etag which is used as a concurrency mechanism. It’s effectively a hash of the policy. If, when you put the policy back to the service, the hash of the service’s stored policy differs from the etag in the document you put to the service, the service knows that the policy was revised and that your changes aren’t applied and will be rejected.
This is a challenging mechanism to represent in Deployment Manager and the GitHub samples include a helper function that merges a service account into a policy. I understand from the Deployment Manager team that an alternative solution should be available soon.
The Cloud SDK includes a convenience method add-iam-policy that does the work behind the scenes:
gcloud projects add-iam-policy-binding $PROJECT \
--member=serviceAccount:${SERVICE_ACCOUNT} \
--role=roles/editor
I‘m told by the Deployment Manager team that this functionality is imminent and in the form of a new capability called ‘actions’. I’ll update this content when the feature’s released.
Runtime Config(urator)
Deployment Manager includes a feature called Runtime Config(urator). It provides functionality that particularly relevant to Deployment Manager but, it is in fact, a standalone service. I described a way to use Runtime Config in combination with Global Scope as a way to pass configuration data to Cloud Functions.
Runtime Config provides a key improvement to deploying Docker swarm. When a swarm is initialized, the 1st (Docker Engine) node generates a manager token and a worker token that must be provided by other nodes to prove themselves when joining the cluster.
1st (genesis) node:
sudo docker swarm initSwarm initialized: current node (73gqde43chycmq9s7f93klmmf) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-243cu5kol7pfie85svs8cnvsmbypm2gp8mhe29b12izyj5cr92-e2r75b6yxdx8mk3rrwjf4kfzi 10.138.0.2:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
It’s easiest to obtain the token(s) from this genesis node by requerying it and, as a hint to how we’ll proceed, assigning the value to an environment variable:
WORKER_TOKEN=$(sudo docker swarm join-token worker --quiet)
MANAGER_TOKEN=$(sudo docker swarm join-token manager --quiet)
These commands return the token(s) only.
With these tokens, from other (Docker Engine) nodes (on other VMs), we could then:
sudo docker swarm join --token ${WORKER_TOKEN} swarm-master:2377sudo docker swarm join --token ${MANAGER_TOKEN} swarm-master:2377
So, two outstanding questions, how do we:
- make the tokens available to the other nodes?
- block creation of the other swarm nodes on the tokens’ availability?
This is what Runtime Config provides us. Runtime Config is scoped to a single project and permits further scoping through namspaces. Let’s firstly create a namespace called ‘swarm’ for the Docker swarm mode tokens:
gcloud beta runtime-config configs create swarm
We can then create variables arbitrarily within this ‘swarm’ namespace. I chose to create a variable called ‘worker’ and another called ‘manager’ but to put these in a hierarchy under ‘token’. The ‘token’ prefix is redundant but..
gcloud beta runtime-config configs \
variables set /token/worker ${WORKER_TOKEN} \
--config-name swarm \
--is-textgcloud beta runtime-config configs \
variables set /token/manager ${MANAGER_TOKEN} \
--config-name swarm \
--is-text
I’m using the “ — is-text” flag. The tokens are plaintext (alphanumeric) and this saves having to base64 decode the values when retrieved.
Runtime Config provides mechanisms for watching and waiting on variables but it was not immediately clear to me that either of these provides the functionality needed here. Instead — and somewhat unhappily — I decided to ‘hack’ a solution (please comment-ping me with improvements).
The Deployment Manager script creates token/manager and token/worker variables with a $DUMMY setting *before* it creates any VMs (obviously including the genesis node). The VMs all know that, if a value from the variables if $DUMMY, the genesis node is not yet ready and they block and retry after one minute.
Advantage(s)
- Simple
Disadvantages
- Less elegant
- Potentially infinite blocking
- Requires shared knowledge of the $DUMMY value
def GenerateRuntimeConfigConfig(context, name):
"""Generate a Runtime-Config Config 'name'"""
return {
'name': 'config-name-' + name,
'type': 'runtimeconfig.v1beta1.config',
'metadata': {
'dependsOn': [
'runtimeconfig',
],
},
'properties': {
'config': name,
}
}def GenerateRuntimeConfigVariable(context, name, variable, default):
"""Generate Runtime-Config 'variable' with 'default' text value"""
project_id = context.env['project']
config_name = 'config-name-' + name
return {
'name': 'variable-' + variable,
'type': 'runtimeconfig.v1beta1.variable',
'metadata': {
'dependsOn': [
config_name,
]
},
'properties': {
'parent': '$(ref.'+ config_name +'.name)',
'variable': variable,
'text': default,
},
}
NB The Runtime Config service must be enabled and this is what’s checked in the dependsOn when the Config is created.
Then, in the startup script for the 1st (genesis) node, after the swarm init, the worker and manager tokens are requested and are used to replace the $DUMMY values:
sudo docker swarm initWORKER_TOKEN=$(sudo docker swarm join-token worker --quiet)
MANAGER_TOKEN=$(sudo docker swarm join-token manager --quiet)gcloud beta runtime-config configs variables set \
/token/worker ${WORKER_TOKEN} \
--config-name=swarm \
--is-textgcloud beta runtime-config configs variables set \
/token/manager ${MANAGER_TOKEN} \
--config-name=swarm \
--is-text
So that the startup script for a worker can pull the value from the Runtime Config variable for the token. It may be $DUMMY but, when it’s isn’t, it will be the correct worker token value:
WORKER_TOKEN=$(gcloud beta runtime-config configs variables get-value /token/worker --config-name=swarm)while [ "${WORKER_TOKEN}" == "DUMMY" ]
do
sleep 60s
donesudo docker swarm join --token ${WORKER_TOKEN} swarm-master:2377
And, clearly, the startup script for a manager flips the variables:
MANAGER_TOKEN=$(gcloud beta runtime-config configs variables get-value /token/manager --config-name=swarm)while [ "${MANAGER_TOKEN}" == "DUMMY" ]
do
sleep 60s
done
sudo docker swarm join --token ${MANAGER_TOKEN} swarm-master:2377
Let’s test it:
gcloud deployment-manager deployments create docker-swarm \
--config=docker_swarm.yaml \
--project=$PROJECTThe fingerprint of the deployment is ...
Waiting for update [operation-...]...done.
Update operation operation-... completed successfully.cloudresourcemanager deploymentmanager.v2... COMPLETED
$PROJECT cloudresourcemanager.v1.project COMPLETED
iam deploymentmanager.v2... COMPLETED
runtimeconfig deploymentmanager.v2... COMPLETED
swarm runtimeconfig.v1beta1.config COMPLETED
swarm-manager-mig compute.v1.instanceGroupManager COMPLETED
swarm-manager-template compute.v1.instanceTemplate COMPLETED
swarm-master compute.v1.instance COMPLETED
swarm-robot iam.v1.serviceAccount COMPLETED
swarm-worker-mig compute.v1.instanceGroupManagers COMPLETED
swarm-worker-template compute.v1.instanceTemplate COMPLETED
variable-manager runtimeconfig.v1beta1.variable COMPLETED
variable-worker runtimeconfig.v1beta1.variable COMPLETED
And, ssh’ing into swarm-master:
sudo docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
jt7vcsgb2jnst0z33aolmzs0t * swarm-master Ready Active
s3su15bvothaswr17l6fugbml swarm-master-0pj9 Ready Active
jsi7dsfbp9zjcreihrzhr5g7q swarm-master-lb6q Ready Active
ftsrgxyfqeiu1l9o5b9nhs45q swarm-master-tjpm Ready Active
w5mq06h0gvirfldd0ezm1s1j7 swarm-worker-4p1p Ready Active
b9279nr43pjhog2d4pl9crjt4 swarm-worker-r1mx Ready Active
krik9fnw8p4w0jrca3qcicw46 swarm-worker-znmt Ready Active
NB I’ve hacked the output to make it more presentable here: swarm-master is annotated as “Leader” and the three masters are all marked “Reachable”
Conclusions
Deployment Manager is powerful but would benefit from more comprehensive documentation for noobs like me. The service is well-designed but it’s not always intuitive (consistent).