How To Commit Your Cloud Credentials To Version Control Systems
Commit your cloud credentials to version control systems securely
In this article, I will share a way to securely commit sensitive information to version control systems (VCS). The focus will be on committing AWS credentials, but the method applies to Azure and Google Cloud Platform (GCP) as well.
The code to reproduce the results can be found here.
This article is intended for data scientists; you do not come from a software development background and hence aren’t aware of software engineering best practices. However, judging by this search result, it seems developers, in general, can benefit too.
Imagine you are a junior data scientist tasked with building a pipeline to analyze text data. Naturally, this task involves moving files around S3 buckets and calling various AWS services to do things like sentiment analysis, named entity recognition, topic modeling, etc.
You are given a set of credentials that allows you to access these services. You decide to authenticate via environment variables, so you write a script called
env.sh which does the following:
Should you commit this script to your VCS?
Committing this script makes it convenient for your colleagues to carry on where you left off. They do not need to search for search the right credentials. They just need to clone your repo, and everything will work out-of-the-box.
However, committing this script is a security risk. Just imagine someone hacking/leaking the content of your company’s VCS server. Or maybe your company one day decides to open source this repo, thus accidentally exposing their private information for the whole world to see because they forgot to scrub it from the repo’s history.
Is there a solution that balances convenience and security?
To make things more concrete, we will be working with the following script called
python main.py "this movie is really awesome!" will call Amazon Comprehend to produce a sentiment classification and confidence score for the text
"this movie is really awesome!" , like so:
The relevant part of
main.py for this article is line 8 which creates a client for the required service and relies on the existence of the
AWS_SECRET_ACCESS_KEY environment variables for authentication. Therefore, you need to execute
source env.sh before running
The question now is: How should we commit
The solution involves encrypting
env.sh using a tool called sops and using AWS KMS to manage the keys for the encryption and decryption process. Sops works on Windows, Linux, and Mac and integrates well with AWS KMS and similar services from Azure and GCP.
There are a few steps we need to go through before we can use sops to encrypt
In terms of AWS services, you will need access to a customer-managed key (CMK) in AWS KMS. One way to do it is to have your account administrator create and manage this key for you and create another IAM user whose sole purpose is the ability to use this key for file encryption/decryption. Let’s call this user
As a regular account user, all you have to do is follow these steps:
- Install sops. Click here to download the relevant package for your machine.
- Update your AWS credentials file to include
3. Define the
SOPS_KMS_ARN environment variable so that sops know where to find the CMK:
With that out of the way, we are now ready to encrypt
Encrypting Sensitive Information
env.sh is just a matter of executing:
This is what
env.sh looks like after the encryption process:
Decrypting Sensitive Information
source env.sh will now throw an error as
env.sh is no longer a valid shell script.
You will need to decrypt
env.sh before calling
source. Here’s how you do it:
Let’s take a moment to consider what the solution in the preceding section has achieved.
The encrypted form of
env.sh (Figure 7) is still human-readable but is practically gibberish. As such, it can be safely committed to your company’s VCS.
What happens if this repo was hacked or leaked to the public? It is nearly impossible to recover the credentials stored in without the right key to decrypt it, which only has access to it. Note that the credentials of
sops-user is never committed to the repo.
In terms of collaboration, your colleagues need only know the credentials of
sops-user to decrypt
env.sh and continue developing
Therefore, this solution is a good compromise between convenience and security.
This article has presented a simple method to commit sensitive information by using a tool named sops and leveraging the services offered by cloud service providers.
Let me know in the comments if you have any questions or would like to share alternative methods.