Perils of GCP’s Compute Engine default service account
In GCP, a service account is an identity that can be used by services and applications running on our compute Engine instance to interact with other Google Cloud APIs.
For example, if our application (deployed on an instance) reads and writes files on Cloud Storage (GCS), it must first authenticate to the Cloud Storage API. To allow that, we can create a service account, grant Cloud storage access, add it to the instance where the application would be running.
We can either generate a json key for the service account and pass it explicitly as ‘GOOGLE_APPLICATION_CREDENTIALS’ to the application environment or let the application query the instance metadata server, get the details of the service account added to the instance and authenticate to Cloud storage API using it.
Two types of service accounts are available to Compute Engine instances:
- Custom/User-managed service account
- Google-managed service account
User-managed service accounts include
- new service accounts that we explicitly create — created and managed by us
- the Compute Engine default service account — created by Google when we enable compute engine api in the project but the permissions are managed by us
In this post, we are going to look at ‘compute engine default service account’ pertaining to compute instances and why it is not so good practice to use it in our environments.
Compute Engine default service account
When we enable the compute engine API in a new project, Google creates the Compute Engine default service account and adds it to our project automatically
It goes by the following email:
PROJECT_NUMBER-compute@developer.gserviceaccount.com
As per google documentation,
The Compute Engine default service account is created with the IAM project editor role, but you can modify the service account’s roles to securely limit which Google APIs the service account can access.
So google automatically creates this service account and grants project editor role (Primitive role). When we create a compute instance via console or command line, this service account is added by default to the instance.
This is where the problem starts.
No Least Privilege Principle
“Least Privilege Principle” — favorite words of every Infrastructure/Cybersecurity manager.
According to Saltzer and Schroeder in “Basic Principles of Information Protection” :
Least privilege: Every program and every user of the system should operate using the least set of privileges necessary to complete the job. Primarily, this principle limits the damage that can result from an accident or error.
According to Bishop in “Design Principles,” Section 13.2.1, “Principle of Least Privilege,”
The Principle of Least Privilege states that a subject should be given only those privileges needed for it to complete its task.
If a subject does not need an access right, the subject should not have that right.
Granting Editor role to a service account by default completely goes against the least privilege principle. Our application running on the instance can access all the services it doesn’t need access. It can access Cloud Storage, Bigquery, create/delete most of the services.
In “Understanding Roles” documentation, Google themselves caution not to grant Basic/Primitive roles unless there is no alternative. Yet in all the projects, this default compute engine service account is created with Editor roles.
Difficult to implement access control
On foresight, this looks like a convenient/no operations overhead option.
We create an instance, it has a default compute engine service account, application running on the instance needs to access GCP services, Editor role grants all the privileged permissions to the service account, so our application seamlessly access the services it requires, no friction, no access denied errors, all fine!
Then we create multiple instances, all of them would be using Compute Engine default service account, so applications running in each instance can access all the services.
Since all the instances are using a single service account, it will become very difficult to implement access control here. If we try to modify/downgrade the permissions of the service account, it will affect all the running instances. To change/remove the service account from an instance, we have to stop the instance which will cause a downtime for our application.
Not just VM instances we create
When we talk in the context of compute engine, it usually appears to be the VM instances we create under compute engine section.
However there are many GCP services use compute engine for their operations. Mainly services such as Dataproc, Google Kubernetes Engine (GKE), Dataflow use compute instances.
So when we create a dataproc cluster without explicitly passing a service account, it would be created with the Compute Engine default service account. All the primary (Master, Worker) and secondary (Pre-emptible) nodes will be using Compute Engine default service account.
GKE node pools also use Compute Engine default service account, when no service account is explicitly provided. As ‘GCE metadata’ is enabled by default in GKE, it exposes the compute metadata to pods. So even a simple ‘hello world’ application runs in GKE can access all GCP services as per ‘Editor’ role permissions.
Maximum blast radius
One of the key pillars of cloud security is ‘minimising the blast radius’ ,
Blast radius in cloud means if there was an explosion in the system (rogue application, intruder, human errors) how far the damage can extend. As per well architected framework, it is recommended to design the systems to minimise the blast radius.
If an intruder gets into our instance or application which is using compute engine default service account, he/she would get access to all the GCP services in the project. So the blast radius is maximised in this case.
Best practice
Now we have seen how Compute Engine default service account causes so many operational, management, security issues and this is something we don’t want to deal with in our production environment after going live.
So what’s the solution to avoid running into these issues? Custom service account.
All of the GCP services (that use compute engine) support using custom service account — so for each resource we can create a separate custom service account, explicitly provide it during the creation of resources and grant access as required.
At Zeotap, we follow the practice of creating a dedicated custom service account for each instance (optional), dataproc clusters, and GKE clusters — name as same as the resource name. In addition, we use another custom service account for each application in GKE(if the pod access GCP services) and grant required access. This way pods also have restricted access to GCP services.
Although it seems complicated to manage individual service accounts for each resource, this will be super useful in the long run. With GCP’s IAM recommender, we will also get to know the permissions that are not used by the service account and can remove/modify it accordingly.
Summary
Compute Engine default service account is so convenient to use as it gives access to everything, so we don’t have to worry about access part. But in terms of security, access control, least privilege principle — it will cause so much of problems. Using the custom service account will solve these problems, it may not be a convenient option but it’s the right option in terms of best practices.