Exploring the Quirks of GCP’s Metadata Server
When running workloads on GCP, you’ve likely used the metadata server, often for obtaining service account access tokens via the GCP SDK. The metadata server is a key component in GCP environments, providing instance-specific data such as metadata, service account credentials, and useful configuration details. However, the official GCP documentation left me wanting to know more about its request and response structure…
By definition the metadata server API endpoint isn’t one that is accessible over the public internet and Google goes out of their way to justifiably make it hard to accidentally expose this endpoint with a reverse proxy by making “any requests that contain the header X-Forwarded-For
…automatically rejected by the metadata server.”
Therefore, for my own curiosity and wanting to better understand the metadata server, I created gcpmetadataexplorer, an Open Source MIT licensed Docker based web application, written in Go with HTMX, that provides a simple interface for exploring the metadata server.
ghcr.io/unitvectory-labs/gcpmetadataexplorer:v0.1.1
The main purpose of this application is to demonstrate the various requests and their responses from the GCP metadata server.
The site shows a tree for all of the parameters available to that specific VM or container. You can then drill into each attribute and it will show the available data. This includes the URL used to request the data alongside the response that was returned by calling the endpoint. The four different query methods are:
- Metadata — the standard request to the metadata server for the given path.
- Metadata JSON — adds
?alt=json
query parameter to the request so the response is a JSON object. - Metadata Recursive Content — adds the
?recursive=true
query parameter to the request so the data sub-items are shown. This sometimes, but not always, forces the response to be JSON. - Metadata Recursive JSON — adds the
?alt=json&recursive=true
query parameter to the request so it always returns JSON as well as the sub-items.
There are two special endpoints in the metadata server, one for requesting a GCP access token and another for requesting a GCP identity token.
- http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token
- http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=https%3A%2F%2Fexample.com
When using gcpmetadataexplorer access to these is disabled by default given the potential security risk. However, if you wish to enable these you can by setting the ALLOW_TOKENS
environment variable to true
.
Then the request can be made to retrieve the GCP access token for the assigned GCP service account.
Identity tokens requires specifying an audience which is done using the audience
query parameter to the metadata server. An audience can be specified using the form requesting the identity token through the web interface.
There are a number of quirks that I learned about when exploring the GCP metadata server that I feel like are worth mentioning as they took me by surprise and I’ve not found them clearly documented.
First, the metadata server automatically converts between kebab case and camel case for URLs and request bodies, resulting in differences in how fields are returned depending on the request type. With the notable exception of the first part of the URL of computeMetadata, all camel case strings are converted to kebab case in the URLs. This can look inconsistent if you assume that the JSON body returned uses the same parameter names as the ones required to query the API.
This is confusing, so let’s look at an example. These are real responses from the API, but the values of the project id and project number have been changed to protect the innocent.
The GCP metadata server automatically converts between kebab case and camel case depending on the request type:
- Non-recursive responses: Keys are in kebab case (e.g., “numeric-project-id”, “project-id”).
- Recursive responses: Keys switch to camel case (e.g., “numericProjectId”, “projectId”).
The piece that makes this more challenging is if you are parsing and navigating the tree you may be inclined to use the recursive JSON responses to assume what the query paths are. However, this would be incorrect. The non-recursive responses return the correct values for the URL which in this case is “project-id”. If you attempt to access the URL with “projectId” it will not work.
The URL conversion to camel case also extends to the service account emails that may be used in URLs, if those have dashes in them, the valid URL parameter will “un-kebab” the string into a camel case representation making it not match the service account email address!
An additional thing to notice is that the recursive response will sometimes be JSON and other times not be JSON. If there are recursive elements underneath the element it will return JSON as it has the nested objects, but as shown in the following example for the scopes, even though it is a recursive request, being a terminal node the response does not use JSON.
An additional inconsistency arises with access tokens and identity tokens. This one makes sense though, when requesting non-recursively the “token” and “identity” are listed as options. But when listing recursively they are absent, which makes sense as you wouldn’t expect the sensitive tokens to be included in the recursive responses.
These quirks and inconsistencies can make navigating and understanding the GCP metadata service to craft the URL to retrieve the data you want challenging. That is why gcpmetadataexplorer is so useful. This article does not outline all of the quirks of the metadata server but provides you the tools to test it on your own.