Configuration of a Postgres Database on Google Cloud

Paul Nwosu
15 min readFeb 3, 2023

In this project, we will set up and connect to a Postgres database by setting it up locally (on the remote instance we created), and also by using Google Platform as a service solution, Cloudsql. The major aim of this project is to show you the various options you can use to set up a database, and how to access the database both from internal and external networks.

This is the fourth part in a series I started called “Building your First Google Cloud Analytics Project.”

Project Structure

  • Introduction
  • Set up of Docker Compose
  • Setting up pgcli for Connectivity test
  • Local Set Up of Postgres
  • Local Set Up for Pgadmin
  • Cloud SQL Set Up
  • Connecting to the Database — ports, firewall rules
  • Best Practices when Configuring a Database
  • Summary

Introduction

PostgreSQL is a popular open-source relational database management system that provides powerful and flexible data storage and management capabilities. Connecting to a PostgreSQL database is a crucial step in accessing and utilizing the data stored in it. This article will provide a comprehensive guide on how to configure and connect to a PostgreSQL database, including the various methods and tools available for establishing a connection and interacting with the database. This part of the project will be more of a practical step on step guide rather than a complete code project.

Set up of Docker Compose

In the first article in this series, we talked about how to install docker and set it up for use. Let’s now talk about docker-compose and see how it differs from docker. Docker-compose allows us to launch multiple containers using a single configuration file so that we don’t have to run multiple complex docker run commands separately.

Docker compose makes use of YAML files. YAML (YAML Ain’t Markup Language) is a human-readable data serialization format used for storing and exchanging data. It is often used for configuration files and data exchange between different systems and applications. YAML files use indentation to denote structure and data types, making them easy to read and write compared to other formats such as XML and JSON. Using a docker-compose yaml file we can do a lot of the things we do with docker and more.

The basic difference between docker-compose and docker

Docker is a platform for building, running, and managing containers. A container is a lightweight and portable software package that includes all the dependencies and libraries needed to run a specific application.

Docker Compose, on the other hand, is a tool for defining and running multi-container Docker applications. With Docker Compose, you can define your application’s services, networks, and volumes in a single file, then start and stop all services from this file. This makes it easier to manage and automate the deployment of multi-container applications. In other words, Docker Compose simplifies the process of managing multiple containers by allowing you to define and run all of your containers in a single place.

When building a container using docker, you need to set up the Dockerfile, but for docker-compose, you need to have the yaml configuration file. Although, you can always reference a Dockerfile from the docker-compose yaml file, and use a docker-compose command to build the image.

Here are some basic commands for docker-compose

Building the Image:

docker-commpose build

Note: Docker-compose is mainly used to run multiple docker containers and not build images. For you to build a docker image with docker-compose, you still need to reference the Dockerfile in the yaml configuration file.

Running the container images:

docker-compose up -d
  • docker-compose up is the command used to start the services defined in the docker-compose.yml file.
  • -d (or --detach) is an optional flag that runs the services in the background and returns control of the terminal back to the user. This allows the services to continue running even after the terminal is closed.

To shut down the services, you use docker-compose down . This command must be used in the same directory as the yaml file.

Those are the basic commands to help you get started with docker-compose. In the next project, we will be making use of docker-compose.

Setting up pgcli for connectivity test

pgcli is a command-line interface (CLI) tool for PostgreSQL. It is designed to be a drop-in replacement for the standard psql client, with additional features and improved performance. No, you do not need to have PostgreSQL installed on your local machine to use pgcli. pgcli connects to a PostgreSQL database over a network, so as long as you have access to a remote PostgreSQL server and the necessary credentials, you can use pgcli to interact with the database. This makes it a convenient option for those who want to work with PostgreSQL databases but do not want to set up and maintain a local PostgreSQL installation.

Installation

To install pgcli on the Linux remote instance that we are working on, use the commands below:

sudo apt-get install python-dev libpq-dev libevent-dev

sudo apt-get install pip

sudo pip install pgcli

To test that the installation worked correctly, use pgcli --version

With pgcli installed on the remote machine, we have the ability to connect to any PostgreSQL database, provided we have the correct credentials and the server we are connecting from has been granted access to the database.

To use pgcli, you need the ff information about the database

pgcli -h hostname -p port -u user -d database_name

  • h stands for the hostname which can be the internal/external ip address of the server hosting the database
  • p stands for port, this is usually 5432
  • u stands for the name of the user in which you have predefined
  • d stands for the name of the database in which you have predefined

Once this goes through, you will be asked for the password, input it and press enter (Note: Nothing will show while you are typing the password)

Local setup for Postgres

This setup entails using a docker image to run a Postgres image as a container on your local machine (the remote instance). With docker, we don’t have to install postgres on our machine to start using it, we can just pull the image from docker hub and then run it on our local machine.

Here is a docker command that pulls a postgres image from docker hub and starts it up.

docker run -it \
-e POSTGRES_USER=‘root’ \
-e POSTGRES_PASSWORD=‘root’ \
-e POSTGRES_DB=‘test_db’ \
-v $(pwd)/test_db_postgres_data:/var/lib/postgresql/data \
-p 5432:5432 \
postgres:13

The container needs 3 environment variables:

  • POSTGRES_USER is the username for logging into the database. We chose root as the username. [Change to something else]
  • POSTGRES_PASSWORD is the password for the database. We chose root as the password. [Change to something else]
  • POSTGRES_DB is the name that we will give the database. We chose test_db as the name of the database.

-v points to the volume directory. The colon: separates the first part (path to the folder in the host computer) from the second part (path to the folder inside the container). This is useful because we need to link the part where the data is stored in the docker container to where it will be stored in the host computer.

  • Path names must be absolute. If you’re in a UNIX-like system, you can use pwd to print your local folder as a shortcut; this example should work with both bash and zsh shells, but fish will require you to remove the $.
  • This command will only work if you run it from a directory that contains the test_db_postgres_data subdirectory you created above.

The -p is for port mapping. We map the default Postgres port 5432 in the container to the same port 5432 in the host. In a case where you want to have two postgres databases running on your host machine at the same time, you can run another docker command but this time map the port to 5431 on your host machine because you can’t have multiple services running on the same port on your host machine.

The last argument is the image name and tag. We run the official Postgres image on version 13.

Local Setup for Pgadmin

PgAdmin is a free and open-source administration and management tool for the PostgreSQL database. Unlike pgcliit provides a graphical user interface for managing PostgreSQL databases, allowing users to interact with databases in an intuitive and user-friendly way. pgAdmin includes features such as a SQL editor, server management, backup and restore functionality, and a visual query builder.

Just like Postgres, you can also use docker to set up pgcli using the command below:

docker run -it \
-e PGADMIN_DEFAULT_EMAIL="admin@admin.com" \
-e PGADMIN_DEFAULT_PASSWORD="root" \
-e PGADMIN_COOKIE_SECRET=53tegevd66efdvfff
-v "./data_pgadmin:/var/lib/pgadmin"
-p 8080:80 \
dpage/pgadmin4

The container needs 3 environment variables:

  • PGADMIN_DEFAULT_EMAIL is the email for logging into the pgadmin.
  • PGADMIN_DEFAULT_PASSWORD is the password for logging into pgadmin.
  • PGADMIN_COOKIE_SECRET is the secret key for the secure cookie used by pgAdmin to store session information.
  • v points to the volume directory. This is useful because we need to mount the storage location where the state of the pgadmin is stored in the host computer.

The last argument is the image name dpage/pgadmin4.

Note: We are going to be using pgadmin in the next project to interact with the database. We can also launch multiple instances of pgadmin by running them on diffrent ports on our host machine.

Cloud SQL Set Up

CloudSQL is a managed relational database service provided by Google, hosted on the cloud. With CloudSQL, you can create a database instance and connect it to your applications. CloudSQL offers three main database options: MySQL, Postgres, and SQL Server. In this project, we will create a Postgres database instance on CloudSQL and set up a connection to it. A CloudSQL instance is similar to a Compute Engine instance, but with a database already installed and no direct access to the underlying hosting machine. It is fully managed by Google.

In the previous section, we successfully set up Postgres using a Docker image on our remote host machine, showcasing the concept of building your core infrastructure. In contrast, Google CloudSQL eliminates the need for manual installation and management of the Postgres database on your instance by offering a managed platform for database management.

Here is a step-by-step guide on how to set up a postgres database on Google Cloud.

  1. Open the Cloud Console: Log in to the Google Cloud Console and select the project that contains your other resources.
  2. Create a Cloud SQL instance: In the Cloud Console, go to the Cloud SQL instances page (Navigation Menu > SQL). Click the Create instance button. This would take a while.
  3. Configure your instance: Choose PostgreSQL as the database engine, select a region, and provide a name for your instance. You can also configure settings such as disk size, availability, and backups.
  4. Create a database: In the Cloud Console, go to the Cloud SQL instances page and click on your instance’s name. On the instance’s details page, click on the “Databases” tab and then click on the “Create database” button.
  5. Create a user: On the instance’s details page, click on the “Users” tab and then click on the “Add user account” button. Add a username and a password.

Click on the overview tab to view the external ip address of the cloudsql instance. By the time you are done, you should have the following information:

  • Private IP address which can serve as the hostname
  • Public IP address which can serve as the hostname
  • User Name
  • Password Name
  • Database Name
  • Port: By default, postgres runs on port 5432

Connecting to the Databases

Connecting to Databases relies heavily on communicating using IP addresses and ports. In this section, we are going to examine the various ways of connecting to our database both on the host machine (remote instance) and on the cloudsql instance.

Connecting to a Postgres Docker Container Running on a Virtual Machine

A docker Postgres image is used to run multiple containers on a virtual machine. This is similar to having postgres installed locally on the virtual machine.

We are going to review 3 basic ways of setting up a connection

  1. Internal Connection within the host machine

Internal connections can be made on the host machine using local host and the port that is mapped to the postgres container when you run the docker command to start up the postgres container.

Take for instance, to test that a connection can be made to a postgres database running on port 5432 on your host machine as seen from the command below. Run the docker command below to spin up the postgres container. [The -d flag means run the command in detached mode].

docker run -it \
-e POSTGRES_USER=‘root’ \
-e POSTGRES_PASSWORD=‘root’ \
-e POSTGRES_DB=‘test_db’ \
-p 5432:5432 \
-d \
postgres:13

Test the connection:

pgcli -h localhost -p 5432 -u root -d test_db

If the command above works, you will be asked to input the password you initially set, which is root. You will then have access to the database. Press Ctrl + D to close the database and return back to your terminal.

Here is where the advantage of docker comes in, with docker, you can run multiple postgres database on your host machine without worrying about any form of interference or complex configurations.

Run the ff command on your host machine to spin up another postgres database.

docker run -it \
-e POSTGRES_USER='root2' \
-e POSTGRES_PASSWORD='root2' \
-e POSTGRES_DB='test_db2' \
-p 5431:5432 \
-d \
postgres:13

Now, connect to it using the command below:

pgcli -h localhost -p 5431 -u root2 -d test_db2

Notice that I mapped the port of the 2nd database to 5431 on the host machine. This is because you cannot have multiple services running on the same port on your host machine.

2. Connections via Instances in the same VPC network as the Host Vm

To better understand this, go to the Google Cloud Console > Navigation Menu > Compute Engine. Create a new instance (ensure this is in the same project as your other resources). Set up your ssh configuration on your local machine as shown in part 1 of this series and connect to the VM.

When you create a virtual machine (VM), it is assigned an internal IP address from the default VPC network. This enables internal communication among resources within your Google Cloud project. As a result, two remote instances can communicate with each other as they are within the same VPC network.

Steps to Follow:

  • Ensure that you still have the two postgres containers running on the first remote instance (one running on port 5432, the other running on 5431
  • Log into the second remote instance, and run the ff command sudo apt-get update && sudo apt-get install pgcli
  • Test connectivity from the second remote database to the two databases using the command below.
pgcli -h internal_ip_address -p 5432 -u root -d test_db

pgcli -h internal_ip_address -p 5431 -u root2 -d test_db2

The internal IP address referred to in the command above is the internal IP address of the first remote instance that is hosting the Postgres database. To obtain the IP address, navigate to the Compute Engine tab in the Google Cloud Console, and copy the internal IP of the host virtual machine (VM). Additionally, make a note of the external IP address, as it will be used later.

3. Connections via networks outside the VPC network of the Host VM

Typically, you would want to connect to the database from a network outside the VPC network that hosts the Postgres database. In such scenarios, we need to update the firewall rules of the VPC network connected to the VM hosting the database to allow external connections.

Log in to the Google Cloud Console.

  1. Navigate to the VPC networks page by clicking on the “VPC network” option in the navigation menu.
  2. Click on the “Firewall” tab, and then click on “Create a firewall rule”
  3. In the “Name” field, enter a name for the firewall rule. Leave the network as default.
  4. Under Targets, select All instances in the network. Leave Source filter as IPv4 ranges.
  5. In the “Source IP ranges” field, enter “0.0.0.0/0” to allow connections from any IP address. (This is not best practice)
  6. In the “Allowed protocols and ports” field, click on the TCP checkbox and enter “5432, 5431” to allow connections on both ports on your host machine.
  7. Click on the “Save” button at the bottom of the page to create the firewall rule.

With this all set, you can then connect to the database using the external address of the host VM.

pgcli -h external_ip_address -p 5432 -u root -d test_db

pgcli -h external_ip_address -p 5431 -u root2 -d test_db2

Connecting to a Postgres Database Hosted on Google CloudSQL

The cloudsql instance is just like a vm hosting the postgresDB without you having direct access to alter the configurations. This is what we refer to as a managed service.

  1. Internal Connection within the VPC Network

Cloudsql just like a compute engine instance also has both an internal ip address and external ip address. With its internal ip address, you can connect to the cloudsql instance from any virtual machine on the same VPC network.

pgcli -h internal_ip_address_of_cloudsql_instance -p 5432 -u user -d db

You can view the internal(private) and external(public) ip address of your cloudsql instances on Google Cloud Console by navigating to the SQL tab on the Navigation Menu.

2. External Connections outside the VPC network of the Cloudsql Instance

This also includes setting up firewall rules that allow external networks to connect to the cloudsql instance.

Navigate into the Cloudsql instance you just created by clicking on the Instance Id. Go to the “Connections” section of the Cloud SQL instance. Scroll down to Authorized networks. Click on Add Network. Add a name (Anything you want). Add 0.0.0.0/0 as the Network. This basically means any external connection now has access to the database provided they have the login credentials.

pgcli -h external_ip_address_of_cloudsql_instance -p 5432 -u user -d db

Note:

Even after you have added 0.0.0.0/0 as an allowed network. This prefix will allow any IPv4 client to pass the network firewall and make login attempts to your instance, including clients you did not intend to allow. Clients still need valid credentials to successfully log in to your instance.

In the previous setup, we were able to run multiple Postgres databases using Docker images on the same virtual machine without encountering any dependency issues because they were running on different ports (5432 and 5431). With CloudSQL, you can only run one Postgres database per instance, using port 5432, but you can have multiple CloudSQL instances and connect to each of them separately.

Best Practices when Configuring a Database

When configuring a database, there are certain things you should watch out for and best practices to follow to help you get started.

Creation of Read Replicas: Read replicas are read-only copies of your database instance. When configuring your database, you have the option to set up a read replica, creating another instance of the database. This instance is assigned both an internal and external IP address, which can be used for connecting to it.

To create a read replica, navigate to the instance you have created, and then click on the “Replicas” tab. Select the “Create Read Replica” option at the top to initiate the process. Once the read replica is created, you can configure its own connection settings, which will be distinct from the original database that has both read and write access.

Exposing the Database to External Connections: In the previous section, we explored a method for external networks to connect to your database instances by adding 0.0.0.0/0 as an authorized network. However, this is not considered a secure best practice. A more pragmatic approach is to add the specific IP address of the device you are using to access the database, such as another virtual machine that you have access to.

These are just two out of the many factors to consider when configuring a database, it is very important that you focus on building resilient systems that are reliable and can withstand zones failure.

Summary

In this article, we were able to set up multiple postgres databases using the Infrastructure as Code model (IaaS) and the Platform as a Service model (PaaS). The IaaS model requires us to install docker on the host machine, and run the container from a postgres image. With this method, you can spin up multiple postgres containers from a single image, and manage their configurations individually or with a docker-compose.yaml file.

We also utilized Google Cloud SQL, which is a PaaS solution, and used it to create a database to which you can connect. I made reference to some best practices on how to allow both internal and external networks to connect to the databases. Lastly, I referenced how you can use pgcli for connectivity tests to your database.

In the next project, which is a sequel to this, we will build a data pipeline that will read data from the web and write that data into our postgres database.

--

--