An Easy-to-Follow Guide for Beginners: Setting up an Automated Backup Procedure on Google Cloud Platform (GCP)

Harshal Shukla
Google Cloud - Community
9 min readFeb 15, 2024

Have you ever thought? How does Google Photos know what's in your Gallery? Yes, we all know that It’s because of the Backup Process. When it comes to backup, we start thinking about the cloud. We will not go into depth regarding How Google Photos Works? as it is out of scope. If you want to know it, visit here:

In this blog, we will work on a small use case that automatically takes a backup from MongoDB Atlas and stores it in Google Cloud Storage.

Architecture Diagram:

Prerequisites:

  1. Google Cloud Platform (GCP) Account: You’ll need access to a GCP account. If you don’t have one, you can create a free trial account, which includes a $300 credit to get started.
  2. MERN-based Website: You should have a basic understanding of the MERN stack. You are free to utilize your own website.
  3. MongoDB Atlas Account: You’ll need access to a MongoDB Atlas account. You should have a basic understanding of the MongoDB Atlas.

After you’ve completed these requirements, you should be ready to go through detailed instructions on how to perform an automated Backup Procedure in Google Cloud. Come along for the thrilling ride as we set out!

Steps to be followed:

  • Create a Compute Engine Instance
  • Create a cluster on MongoDB Atlas
  • Host a website on Compute Engine
  • Create a Firewall
  • Create a Cloud Storage Bucket
  • Create a Cloud Function
  • Create a Cloud Scheduler
  • Testing
  • Conclusion

Before moving towards the steps, go to your web browser and visit Google Cloud Console.

Step 1: Create a Compute Engine Instance

  1. In the Cloud Console, On the Navigation menu, click Compute Engine >> VM instances and then click Create Instance.
  2. Set the Name to backup-vm
  3. For Region: us-central1 (lowa),and Zone: us-central1-a
  4. In the Machine Configuration section, for Series select E2.
  5. For Firewall rules, check all available rules.
  6. Click Create, then wait for backup-vmto be created.

Step 2: Create a cluster on MongoDB Atlas

  1. Go to your web browser and visit MongoDB Atlas.
  2. Under Organization >> Click on New Project >> Set the Name to project-mongo >> Next >> Create Project
  3. Inside project-mongo >> Under Data Services >> click on Create.
  4. Select Type: M0 (FREE) >> Provider: Google Cloud >> Region: Mumbai (asia-south1) >> Name: cluster-mongo >> Create
  5. Under Username and Password, create the user >> click on Create >> select My Local Environment.
  6. Fill up 0.0.0.0/0 in the IP Address box >> Add Entry >> Finish and Close.

Step 3: Host a website on Compute Engine

  1. In the Cloud Console, Click on the Navigation menu, and click Compute Engine >> VM instances.
  2. Under Connect, click on SSH. (Note: It takes some time to establish the session)
  3. Run the following command in the secure shell.
sudo apt-get update -y
sudo apt-get install git -y

4. Now, you can clone the git repository where your application code is available. (Website Credits: Mugilan E.S.)

git clone https://github.com/Mugilan-Codes/dev-book.git

5. Run the command ls to list the available files/folders. Now, You can see your cloned repository within SSH.

6. Run the command cd dev-book/ to change the current working directory. It should look like this:

7. Install Node.js and npm:

sudo apt install nodejs npm -y

8. Install the dependencies as mentioned in the package.json using npm install <dependency> --save -y :

9. Run the command cd config/to change the current working directory, and run the command nano db.js to open the file and hit Enter:

10. Go to your web browser and visit MongoDB Atlas.

11. Click on CONNECT >> Drivers >> Under Add your connection string into your application code >> Copy the connection string >> Close.

12. Again, come to the SSH Tab and replace mongoURIwith your actual connection string. It should look like this:

13. Replace <username>and <password>, with your actual user credentials.

14. Press Ctrl + O, and hit Enter to save the changes. Now, press Ctrl + X to close the file.

15. Run the command cd .. and thennode server.js and hit Enter:

Now, your application server is up.

16. Go to the Cloud Console, On the Navigation menu >> click Compute Engine >> VM instances.

17. under Connect, click on SSH. (Note: It takes some time to establish the session)

18. Run the command cd dev-book/client/ and then npm start app.jsinto the new SSH Tab as given below:

Now, your website is hosted on http://localhost:3000

Step 4: Create a Firewall

  1. In the Cloud Console, On the Navigation menu, click VPC network >> VPC networks and then click on VPC default >> Firewalls >> Click on ADD FIREWALL RULE.
  2. Fill out as shown below:
Name: "backup-http"
Targets: "All instances in the network"
Source filter: "IPv4 ranges"
Source IPv4 ranges: "0.0.0.0/0"
Second source filter: "None"
Destination filter: "IPv4 ranges"
Destination IPv4 ranges: "<backup-vm_Internal_IP>/32"
Protocols and ports: Specified protocols and ports

3. Replace <backup-vm_Internal_IP> with the Internal IP of the VM. You can find it here:

4. Fill out as shown below and click on CREATE.

5. Go to your web browser, and visit the website http://<backup-vm_External_IP>:3000. It will look like this:

Woohoo!!! You have successfully deployed your website on Compute Engine.

Step 5: Create a Cloud Storage Bucket

  1. In the Cloud Console, On the Navigation menu >> Cloud Storage >> Select Buckets >> Click on CREATE.
  2. Fill out as shown below and click on CREATE: (Note: While selecting a name for Bucket, remember that <BUCKET_NAME> must be globally unique.)
Name your bucket: <BUCKET_NAME>
Location type: Region (us-central1 (lowa))
storage class: Set a default class (Standard)

3. If a pop-up window appears, select Confirm >> Create.

Step 6: Create a Cloud Function

  1. In the Cloud Console >> Type “Cloud Functions” in the Search Box >> under PRODUCTS & PAGES >> Select Cloud Functions.
  2. Click CREATE FUNCTION >> Fill out as shown below:
Environment: 2nd gen
Function name: "backup-func"
Region: us-central1 (lowa)
Trigger type: Cloud Pub/Sub

3. For Cloud Pub/Sub topic* >> Click CREATE A TOPIC >> Type “backup-topic” in Topic ID* >> Click CREATE.

4. For Cloud Pub/Sub topic* >> Selectprojects/<PROJECT_ID>/topics/backup-topic >> Click NEXT. Fill out as shown below:

Runtime: Node.js 20
Source Code: Inline Editor
Entry point: backup

5. Replace index.js with:

const { MongoClient } = require('mongodb');
const { Storage } = require('@google-cloud/storage');

const MONGO_URI = 'mongoURI';
const COLLECTION_NAME = 'COLLECTION_NAME';
const BUCKET_NAME = 'BUCKET_NAME';
const TIMESTAMP_FIELD = 'FIELD_NAME'; // Field holding document timestamp

// Function triggered on schedule or event
exports.backup = async (event, context) => {
const mongoClient = await MongoClient.connect(MONGO_URI);
const db = mongoClient.db();
const collection = db.collection(COLLECTION_NAME);

const storage = new Storage({ projectId: 'PROJECT_ID' });
const bucket = storage.bucket(BUCKET_NAME);

try {
// Full backup on first run or if collection is empty
const lastBackup = await bucket.file('timestamp.json').exists();
if (!lastBackup[0]) {
console.log('Full backup required');
const allDocuments = await collection.find().toArray();
await storeBackup(bucket, 'backup.json', allDocuments);
await updateLastBackup(bucket, new Date());
} else {
console.log('Incremental backup');
const lastBackupTimestamp = await getLastBackupTimestamp(bucket);
const newDocuments = await collection.find({
[TIMESTAMP_FIELD]: { $gt: lastBackupTimestamp },
}).toArray();
if (newDocuments.length > 0) {
await appendToFullBackup(bucket, 'backup.json', newDocuments);
await updateLastBackup(bucket, newDocuments[newDocuments.length - 1][TIMESTAMP_FIELD]);
} else {
console.log('No new data since last backup');
}
}
} catch (error) {
console.error('Error during backup:', error);
} finally {
await mongoClient.close();
}
};

async function storeBackup(bucket, filename, documents) {
await bucket.file(filename).save(JSON.stringify(documents));
console.log(`Backup saved to ${filename}`);
}

async function getLastBackupTimestamp(bucket) {
const data = await bucket.file('timestamp.json').download();
return new Date(JSON.parse(data.toString()) || 0);
}

async function updateLastBackup(bucket, timestamp) {
await bucket.file('timestamp.json').save(JSON.stringify(timestamp.getTime()));
console.log(`Last backup timestamp updated to ${timestamp.getTime()}`);
}

async function appendToFullBackup(bucket, filename, newDocuments) {
const data = await bucket.file(filename).download();
const existingDocuments = JSON.parse(data.toString());
const allDocuments = existingDocuments.concat(newDocuments);
await bucket.file(filename).save(JSON.stringify(allDocuments));
console.log(`${newDocuments.length} documents appended to ${filename}`);
}

Make sure to replace mongoURI, COLLECTION_NAME, BUCKET_NAME, PROJECT_IDand FIELD_NAME with your actual MongoDB database URI, Database Collection Name, Database Field Name, Cloud Storage bucket name, and Google Cloud Project ID.

6. Replace package.json with:

{
"dependencies": {
"@google-cloud/functions-framework": "^3.0.0",
"@google-cloud/storage": "^5.8.5",
"mongodb": "^4.0.0"
}
}

7. Click DEPLOY (Note: As a best practice, first test your function and then deploy it).

Step 7: Create a Cloud Scheduler

  1. In the Cloud Console >> Type “Cloud Scheduler” in the Search Box >> under PRODUCTS & PAGES >> Select Cloud Scheduler.
  2. Click CREATE JOB >> Fill out as shown below:
NAME: backup-sche
Region: us-central1 (lowa)
Frequency: * * * * *
Time Zone: India Standard Time (IST) UTC+5:30 (Kolkata)
Target type: Pub/Sub
Select a Cloud Pub/Sub topic: projects/<PROJECT_ID>/topics/backup-topic
Message body: "Backup it"

3. Click CREATE.

Step 8: Testing

  1. Go to the website (http://<backup-vm_External_IP>:3000), and try to register users.
  2. To find the user data in the MongoDB Atlas collection, visit MongoDB Atlas >> under Database >> select users.

3. Go to the Cloud Scheduler >> select backup-sche >> Click FORCE RUN.

Now, backup-sche will publish the message to thebackup-topic, and after every minute it will trigger the function backup-func. At the initial trigger, backup-func will look for user data and store it in Google Cloud Storage as a filebackup.json.After the initial trigger, it will search for the new data in the database that is not present in the file backup.json.Also, you will see another file with a name timestamp.json which stores the recent timestamp.

4. Go to the Google Cloud Storage bucket, and you will find both files.

5. Testing:

Test Case 1:

  1. After the initial trigger, try to delete any user data from the MongoDB Atlas Database.
  2. Wait for the next trigger.
  3. Go to the bucket and check whether the data is available or got deleted.

Test Case 2:

  1. Between the two triggers, try to add two or more than two users.
  2. Wait for the next trigger.
  3. Go to the bucket and check whether the data got added or not.

for pricing, visit the official Google Cloud documentation given below:
1. Compute Engine
2. Cloud Storage
3. Cloud Function
4. Cloud Pub/Sub
5. Cloud Scheduler
6. Virtual Private Cloud

Conclusion:
Implementing an automated backup procedure on the Google Cloud Platform is an essential step toward ensuring the security and resilience of your valuable data. Through careful planning, implementation, and monitoring, businesses can confidently navigate the digital landscape, knowing that their data is securely backed up and readily accessible when needed.

Congratulations!!! You have successfully performed an automated Backup Procedure on Google Cloud Platform (GCP)

If you have any questions or need help with anything regarding the article, feel free to connect with me on LinkedIn.

I’d like to give a shout-out to my team Guysinthecloud for all the support.

Thank You,
Harshal Shukla

--

--

Harshal Shukla
Google Cloud - Community

Lead @Code Vipassana || Facilitator & Innovator @Google Cloud || Google Cloud, Microsoft Azure, Oracle Cloud Certified || Cloud and DevOps Enthusiast