Push AWS S3 files to Google Drive
In this tutorial I will demonstrate how to integrate AWS with Google Drive. My client requested that Excel files containing AWS cost and usage report data and charts be pushed to Google Drive on a weekly schedule, and I hope to provide some insight on how I went about solving the problem. I’ll have a follow-up post demonstrating how to use Python with Boto3 and pandas to produce Excel files with AWS cost and usage data, but for now the focus is integrating AWS with Google Drive.
When I was given this assignment I figured it should be simple enough to find a decent example online for how to use the Google Drive API. I quickly discovered conflicting information and half-baked code snippets, and some of the confusion came from actual Google documentation. So I went directly to the source for the Google Drive API to see how to push a file, and then worked backwards from there to settle on a language for the AWS Lambda function to be created. My personal language choices are Node, Python, Go, or C#, and I eventually settled on Python for its simplicity and availability of information online. When I was nearly done this assignment I did stumble upon PyDrive as a viable solution to the problem, but I ended up using the Google API Python Client.
I created a demo solution google-drive-with-aws in Github, which I will walk through in a series of steps to highlight important pieces. The solution in Github is not fully-automated, but it’s like 95% there with a few manual modifications required to run the necessary scripts. So let’s get started.
From the interaction diagram shown above you can see a few things for both AWS and Google. The basic steps are:
- Create a GCP Service Account
- Create a Google Drive Shared Folder and give access to the service account
- Deploy a CloudFormation stack to create an S3 bucket and Parameter Store parameters
- Manually update Parameter Store with the Google service account credentials and the Google Drive Shared Folder ID
- Set up test files in S3
- Package the Lambda code and layer
- Deploy a CloudFormation stack to create the Lambda resources
- Manually test the Lambda function
- Clean up the resources
Step 1 — GCP Service Account
I’m not going into the details of using GCP beyond what is required for this tutorial, but if you do not already have a GCP account, you can create one free here.
Once you have your GCP account the first item of business is the creation of a Project. A Project in GCP creates a boundary where GCP workloads can be isolated from one another, very much in the same way an AWS account or Azure subscription operates.
You can choose any name you like, but I chose google-drive-access to distinguish it from other projects I already had in my GCP account.
After the project is ready a GCP service account needs to be created. The service account is what allows you to programmatically utilize GCP resources as permitted by the permissions on the service account. For our purposes this service account will only be interacting with Google Drive, and not any Compute, Networking, or Storage resources in GCP.
After the service account is created, a key needs to be created and downloaded, which contains the credentials information in the form of JSON.
You will need the credentials file later on to let AWS Lambda access Google Drive.
Step 2 — Google Drive
Now that the GCP Service Account is created, you need to do two things: Enable the Google Drive API and allow the service account to access Google Drive.
First up is the Google Drive API. Back in the Google Console you can search for ‘Google Drive API’ to find the page where you enable it. Conveniently if you forget this step, when you run the Lambda function later on you will get a nice, clean error message with an actual link to where you can enable it.
Next up is giving the service account permission to access Google Drive. Go to Google Drive and create a folder.
Then click on that folder and choose Share.
Then give the service account permission to write to this folder. You will use the email address associated with the service account, which is available under the Details page for the service account back in Google Console.
Step 3 — AWS CloudFormation
I’m assuming that you found this tutorial because you’re working with AWS and need to push files to Google Drive, but if you’re not already in-flight with AWS you can create a free account here.
When I created this solution for my client, the end result was a bit different from what I have laid out in my Github solution, which is entirely intended to demonstrate the key points of this tutorial. What I decided for the tutorial was a series of steps to quickly get the solution up and running with minimal pain, so hopefully I will accomplish just that.
To get things going, I would recommend using AWS CloudShell. It provides a shell with 1 GB of persistent storage and numerous pre-installed libraries for AWS CLI, PowerShell, Bash, Python, and more. The only requirement is that the AWS account you will be using to run the scripts in this tutorial has the proper permissions to access the shell and the resources associated with this tutorial. For my own purposes I quickly created a group with the attached policies shown below, and then assigned that group to my user. It is definitely giving too much permission, so I don’t recommend leaving your account in this state after you get this solution running.
Now that your AWS account has the necessary permissions, start CloudShell and get to work. The first thing you’ll want to do is clone my repo.
git clone https://github.com/dspenard/google-drive-with-aws.git
Then change to the scripts folder.
Then run the bash script build-bucket-and-ssm-resources.sh. But before doing so you have to modify the script to choose a bucket name. This bucket will be used to hold both the files to be pushed to Google Drive and the Lambda code and Layer files. For simplicity sake I chose to just have one bucket, but in a real production system you would normally have Lambda code and Layer zip files in a dedicated S3 bucket or some other artifact repository.
aws cloudformation create-stack \
--stack-name google-drive-bucket-and-ssm \
--template-body file://bucket-and-ssm-resources.yaml \
--capabilities CAPABILITY_AUTO_EXPAND \
--parameters ParameterKey=LambdaAndReportBucket,ParameterValue=your-bucket-name \ ParameterKey=GoogleServiceAccountCredentials,ParameterValue=google-service-account-credentials \ ParameterKey=GoogleDriveFolderId,ParameterValue=google-drive-folder-id
Once you have a bucket name chosen and run the script, go into the AWS Console to check its status.
Step 4 — AWS Parameter Store
Once the CloudFormation stack is created, a couple manual updates to Parameter Store are needed. The credentials JSON for the Google Service Account and the ID for the Google Drive Shared Folder need to be dropped into these respective parameters. Note that the data types are String and not SecureString due to a limitation of CloudFormation, but this is just a tutorial, so please do not use cleartext strings for any secret information in a real production system. For the Google Drive Shared Folder ID, you can find that by simply clicking on the link for your folder in Google Drive.
Step 5 — S3 test files
Because I was working with Excel files when I created this solution for my client, I chose to keep the same for this tutorial. There are a few dummy Excel files that can be copied to S3 with a script. Again, you have to specify the S3 bucket name in the script. Yes, I could have fully scripted all of this out, but the focus of this tutorial is Google Drive with S3 and Lambda, deployed with CloudFormation, and not Bash scripting or proper deployment pipelines.
Step 6 — Lambda
I’m not going to discuss in detail the functionality in the Python script, but the basics are:
- Get Google Service Account credentials and Google Drive ID from Parameter Store
- Connect to the Google Drive using the credentials and drive ID
- Using Boto3, grab the specified files from S3, then push them to Google Drive using the Google Drive API create command
If you examine the code you will see it’s very easy to follow if you understand basic Python, and I even left some tidbits on things to improve. One such case is calling the Google Drive API update command when a file already exists. You will notice that file names are not considered distinct in Google Drive, and if you call create numerous times for files with the same name, duplicates will occur. Also, some defensive coding techniques should be added to tighten things up.
To get the Lambda code and its Layer zipped up and ready for deployment, run the script build-packages.sh. And again, modify the bucket name before doing so, to match the bucket you created with the first CloudFormation template. I chose to include the entire google-api-python-client library for the Lambda layer, which is a bit beefy at over 4 MB, so feel free to only include the absolute minimum for your Layer if doing this in a production system.
Step 7 — More AWS CloudFormation
At this point the S3 bucket has been set up with dummy test files and the Lambda code zip file and Layer zip file, and Parameter Store has what the Lambda needs to connect to Google Drive. Now it’s time to deploy the Lambda function with a child stack of the parent stack already set up. For this script, build-lambda-resources.sh, you only have to modify the specified stack name if you didn’t leave the default name I specified in the script to deploy the first stack in Step 3.
aws cloudformation create-stack \
--stack-name google-drive-lambda \
--template-body file://lambda-resources.yaml \
--capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM \
Once you confirm the parent stack name is correct and run the script to deploy the stack, go into the AWS Console to check its status.
Step 8 — Test the Lambda
Now onto the best part, to see if this thing actually works. Go to the AWS Console to the Lambda function google-drive-lambda, and create a test event with the proper payload. Call it whatever you want, then click Test and you should be good to go. Note that for this tutorial I chose to have a payload specify a list of files to pull from S3 and push to Google Drive, but in a real production system you could trigger the Lambda from an S3 Event Trigger or choose some other path.
Voila, you just pushed files from S3 to Google Drive!
Step 9 — Cleanup
I didn’t script out the AWS resources cleanup process, but it’s very simple. In the AWS Console, manually delete files from the S3 bucket, then go into CloudFormation and delete the two stacks, starting with the child stack, google-drive-lambda, and then onto the parent stack, google-drive-bucket-and-ssm. That should do it. Happy coding!