Migrating a Self-Managed Database to Cloud SQL

Amarachi Ogu
10 min readSep 8, 2022

--

Image source

Working on a migration project can be challenging and requires careful planning and execution. In this series, we will cover the migration of PostgreSQL database to Google Cloud SQL.

Just like other migrations, databases can be hard to move because there are often applications connected to them and relying on them for data, so moving the database may break those connections. There are also security concerns, that is, the need to securely connect to the database, allowing access to the dependent applications or the appropriate users. The potential downtime associated with database migration needs to be considered and minimized if not totally avoided. All these complicate a migration project significantly.

Migrating to a fully managed database like Cloud SQL is very rewarding, however, it is important to note that each migration journey is unique. A systematic approach to data migration will reduce concerns and stresses typically associated with a migration project and set you up for success.

In this series, we will go through the process of migrating a stand-alone PostgreSQL database running on a Google Compute Engine (GCE) virtual machine (vm) to Cloud SQL for PostgreSQL using the pg_dump method, the standard Import/export method native to Cloud SQL, and the Database Migration Service. This article focuses on setting up a stand-alone PostgreSQL database on Google Compute Engine and loading data into it.

Outline

  1. Why migrate to Cloud SQL.
  2. The migration process.
  3. Setting up a PostgreSQL server on GCE to simulate on-prem setup.
    Tasks:
    a. Creating a Compute-Engine virtual machine
    b. Installing PostgreSQL and setting the default user password
    c. Editing the configuration file
    d. Configure a Google Cloud firewall to open a port
    e. Create a database and tables
  4. Next steps.

Why migrate to Cloud SQL

You may probably be asking yourself why you should move to Cloud SQL and if it is necessary. Well, Cloud SQL is a fully managed database service for MySQL, PostgreSQL, and SQL Server. With Cloud SQL you don’t need to wear your system admin hat and spend hours setting up servers, installing hardware, or worrying about backups and patches.

Cloud SQL takes care of most of the operational tasks associated with running a RDBMS such as server maintenance, OS installation, patches, upgrades, replication, and backups. It also leverages the power of the cloud for high availability and scalability.

You may consider moving to Cloud SQL to take advantage of these and other benefits which include:

  1. Easily create read replicas which enables horizontal scaling for your application reads. You can create cross-regional replicas for disaster recovery scenarios, and manually promote a secondary instance to a primary with ease. There is also a built-in high-availability option by enabling automatic failover.
  2. Automatic storage increases to add more storage when nearing capacity. This may cause small interruptions but it is still a smoother process than having to migrate to more powerful hardware manually.
  3. Backup seamlessly — You can use the on-demand backup to create a backup anytime you want or automated backup to schedule and automate your backups.
  4. Cloud SQL Insights — use the Cloud SQL Insights feature to easily troubleshoot and resolve database performance issues. Pre-built dashboards and visual query plans help developers identify the root cause of problems. At the time of this writing, Cloud SQL Insights is available for PostgreSQL, with MySQL in preview.

Visit Cloud SQL page for further details.

The migration process

The diagram below illustrates the process of migrating an on-premises (on-prem) database to Cloud SQL. I would specify here that this is only for illustrative purposes and not a definitive migration process/path. As earlier stated, each migration journey is unique and there are many paths to follow. However, we chose this one as it meets our requirements and is the most commonly seen use case.

Here we have a self-managed PostgreSQL database running on a virtual machine. This could be any database that you’d typically find on-premises or on a virtual machine in any other cloud.

Follow the Google recommended methodology for migration, which include: assess, plan, execute and optimize to scope your migration. This will help you have a comprehensive inventory of your database/application and determine their requirements and dependencies.

Setting up a PostgreSQL server on GCE to simulate on-prem setup

In this section, we will install and configure PostgreSQL on a GCE instance to simulate on-prem setup, and create a database and two tables which will be used to illustrate on-prem database migration to Cloud SQL.

For the purpose of this article, we will use the f1-micro machine type which works just fine. For your use case, you need to determine how much computing power is required for your application. In general, database systems tend to be more constrained by I/O bottlenecks and hard disk speed than by CPU capabilities.

Tasks

  1. Creating a Compute Engine virtual machine
  2. Installing PostgreSQL and setting the default user password
  3. Editing the configuration file
  4. Configure a Google Cloud firewall to open a port.
  5. Creating a database and tables

Creating a Compute Engine virtual machine

To create a vm of the f1-micro machine type with a target tag “postgres”, in the us-central-1a zone (all other settings left as default), run the following command:

gcloud compute instances create postgres-vm --machine-type=f1-micro --zone=us-central1-a --tags=postgres

Example output:

In the Cloud Console, on the Navigation menu, click Compute Engine > VM instances. The new instances should be listed and running.

Installing PostgreSQL and setting the default user password

The vm has been created, now we will install PostgreSQL and the pglogical database extension. Pglogical is a logical replication system implemented entirely as a PostgreSQL extension. Fully integrated, it requires no triggers or external programs. For more details, check it out on Github.

Install PostgreSQL

  1. In the list of virtual machine instances, click the SSH button in the row of the postgres-vm instance to connect to it. Or run the command:
gcloud compute ssh postgres-vm --zone=us-central1-a

2. Update the packages. In the SSH terminal, enter the following command:

sudo apt update

3. Install PostgreSQL version 13

sudo apt install postgresql-13-pglogical

Set the default User password

During installation, PostgreSQL created a default user, named postgres. This user doesn’t yet have a password, so we need to set one.

  1. Run the following command to access the psql postgres user:
sudo -u postgres psql postgres

The command prompt should look like this: postgres=#

2. Enter the following command to set the password:

\password postgres

When prompted, enter and confirm the password you’ve chosen.

It is important to use a strong password to help keep the system secure. For the purpose of this article, set the password as PASSWORD!

3. Exit psql with the command:

\q

Editing the configuration file

The default authentication method to connect to PostgreSQL is the ‘peer’ authentication method. We would edit the pg_hba.conf file and change the default peer authentication to password authentication md5 so we are prompted to enter a password. Read more about the pg_hba.conf file and authentication methods.

We also need to edit the postgresql.conf file to enable Postgres to listen on all IP addresses.

Edit the pg_hba.conf file

  1. In the SSH terminal window, edit pg_hba.conf. Here we used a nano editor, but you can substitute your favorite editor. For PostgreSQL version 13, run the command:
sudo nano /etc/postgresql/13/main/pg_hba.conf

The content should be similar to the image below:

2. Edit the first line of the pg_hba.conf (the highlighted portion)

from
local all all peer
To
local all all md5

After the edit, the content should be similar to the image below:

3. Save the file and exit the editor. In nano, press Control+o, press enter or return, then press Control+x

Edit postgresql.conf file

To enable Postgres to listen on all IP addresses, the most commonly used configuration is to set listen_addresses to ‘*’ in the postgresql.conf file. The IP address set in hba.conf will restrict access to the database only to the local host.

  1. In the SSH terminal window, edit postgresql.conf. For PostgreSQL version 13, run the following command:
sudo nano /etc/postgresql/13/main/postgresql.conf

2. Under CONNECTIONS AND AUTHENTICATION, look for the line that begins with #listen_addresses = ‘localhost’. Use the command
Ctrl+W
Paste #listen_addresses = ‘localhost’ and press enter

You should see content similar to the image below.

3. Delete the # character to uncomment the line.

4. Replace localhost with *

When you are done, you should have the following:
listen_addresses = ‘*’ # what IP address(es) to listen on;

5. Save the file and exit the editor. In nano, press Control+o, press enter or return, then press Control+x

6. Restart the database server with the following command:

sudo service postgresql restart

Configure a Google Cloud firewall to open a port

PostgreSQL accepts remote connections on port 5432. In this step, we will create a firewall rule that enables traffic on this port. A firewall rule is a global resource, so this rule will apply to all instances with the tag “postgres”.

Follow these steps to add a firewall rule

  1. In the Cloud Console, navigate to VPC networks > firewall. At the top of the page click on +CREATE FIREWALL RULE.
  2. In the Name field, name the firewall. In this example, we used access-postgres
  3. In the Network field, leave the network as default.
  4. In the Targets menu, select Specified Target tags.
  5. In the Targets tags field, enter the network tag postgres that was used to create the instance.
  6. In the Source filter menu, select IPv4 ranges.
  7. In the Source IPv4 ranges field, enter the same IP address that you used in the hba.conf file. This is the IP address of your local computer. Remember to include the /32 suffix, for example: 1.2.3.4/32.
  8. In Specified protocols and ports, check tcp, and enter 5432 for the value.
  9. Leave other configuration settings as default.
  10. Click Create.

Create a database and tables

We have successfully installed and configured PostgreSQL in our GCE vm instance, now let’s create a database and 2 tables which we will use to illustrate the database migration.

  1. In the SSH terminal window, and launch the psql and access the default database named postgres:
sudo -u postgres psql postgres

2. List the PostgreSQL databases on the server: \l

3. Create a new database called sales with the following command:

CREATE DATABASE sales;

The console prints CREATE DATABASE when successful.

4. List the PostgreSQL databases on the server to see the created sales database:

\l

Create a tables

Now, let’s connect to the sales database and create a table named orders and agents and insert data into them.

Create orders table and Insert data

  1. Connect to the sales database with the command:
\c sales

2. Create the orders table. The console prints CREATE TABLE when successful.

CREATE TABLE ORDERS(
ID INT PRIMARY KEY NOT NULL,
NAME TEXT NOT NULL,
QUANTITY INT NOT NULL);

3. Insert data into the orders table. The console prints INSERT 0 8 when. successful.

INSERT INTO ORDERS (ID,NAME,QUANTITY) VALUES
(1, 'iPhone', 2),
(2, 'MacBook Pro', 1),
(3, 'Notepad', 4),
(4, 'Pen', 15),
(5, 'Pencil', 8),
(6, 'iPhone', 8),
(7, 'Pen', 1),
(8, 'Notepad', 9);

4. See all data in the orders table, run the command:

 SELECT * FROM ORDERS;

Create agents table and Insert data

  1. Create the agents table. The console prints CREATE TABLE when successful.
CREATE TABLE AGENTS(
AGENT_CODE TEXT PRIMARY KEY NOT NULL,
AGENT_NAME TEXT NOT NULL,
COMMISSION INT NOT NULL);

2. Insert data into the agents table. The console prints INSERT 0 8 when successful.

INSERT INTO AGENTS (AGENT_CODE,AGENT_NAME,COMMISSION) VALUES 
('a001', 'Alex', 300),
('a002', 'Benjamin', 240),
('a003', 'Lucida', 440),
('a004', 'Anderson', 360),
('a005', 'McDen', 500),
('a007', 'Kuma', 280),
('a008', 'Mark', 600),
('a009', 'David', 410);

3. View all data in the agents table, run the command:

SELECT * FROM AGENTS;

4. Exit psql and the SSH terminal window

\q

We have successfully set up PostgreSQL on a compute engine virtual machine. In the next articles, we will migrate the sales database to Cloud SQL using the pg_dump method, and finally using the Database Migration Service.

Next steps

--

--