Migrate Data — PostgreSQL to BigQuery

Arif Zainurrohman
Nerd For Tech
Published in
4 min readFeb 20, 2021

--

goes to cloud

PostgreSQL , also known as Postgres, a highly stable database backed by more than 20 years of development by the open-source community. Is one of the popular tools for analyzing data at scale.

However, as the data volume and velocity need to increase, it needs huge effort and expertise to optimize and to make it run analytics queries within the desired time. So there can be an impact on your transaction workloads directly or indirectly. This is why most organizations prefer a separate system for analytical workloads.

Maybe the solution is in the cloud. Additional benefits in the cloud such as elasticity in scaling, minimal upfront cost, etc. Hence, many companies choose to move data from PostgreSQL to BigQuery.

BigQuery, a leading contender in this domain is provided by Google Cloud Platform. BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL. It also has built-in machine learning capabilities. BigQuery was announced in May 2010 and made generally available in November 2011. BigQuery. Serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility.

Method to Perform PostgreSQL to BigQuery Migration.

Method to Perform PostgreSQL to BigQuery Migration.
Method to Perform PostgreSQL to BigQuery Migration

We can use the command line to migrate data from PostgreSQL tables and standard file-system files and We can use a normal SQL statement with SQL functions or PL/pgSQL procedures which gives a lot of flexibility to extract data as a full dump or incrementally.

The picture given above represents the high-level steps to be followed to migrate Postgres to BigQuery:

1. Extract data from PostgreSQL

2. Optionally Clean and Transform data

3. Upload to Google Cloud Storage(GCS)

4. Upload to the BigQuery table from GCS.

Let’s take a detailed look at each step.

Step 1: Data Extraction

Copy Command is the most efficient way to move data between PostgreSQL tables and the standard file-system. The COPY TO command copies the results of a SELECT query or contents of a table to that file.

COPY actor TO ‘~Your Folder\actor.csv’ WITH DELIMITER ‘,’ CSV HEADER;

Step 2: Clean and Transform data.

Apart from transformations to accommodate business logic, there are some basic things to keep in mind while preparing data for Extraction.

  1. BigQuery expects CSV data to be UTF-8 encoded.
  2. Column types are slightly different between Postgres and BigQuery. Most of the types have either equivalent or convertible types.

Step 3: Upload to Google Cloud Storage(GCS)

  1. First, you have to log in to your GCP account. On the left sidebar, click on Storage and go to the Browser.
Google Cloud Storage

2. Create bucket if you don’t have and Select the GCS bucket that you want to upload the file. Click on the bucket.

Create Bucket
Bucket

3. In bucket details page, you have to click on the Upload files button and select relevant files from your system. Once the upload is completed, the file will be listed in the bucket.

Upload Files

Step 4: Upload to the BigQuery table from GCS

You can use web console UI.

  1. Go to BigQuery console from the left side panel.
BigQuery

2. Create Dataset if not present already.

Create Dataset

3. Then create table option will appear below Query editor.

Create Table
Choose Source Data

BigQuery offers 3 options to write to an existing table:

  1. Write if empty
  2. Append to the table
  3. Overwrite table.

Step 5: Run SQL Script in BigQuery

Run SQL Script in BigQuery

In the modern era of data-driven business, getting value from the data should be quick with minimum friction. Data Warehouses like BigQuery can give an extremely important role in this. However, moving data to the cloud-based takes multiple steps such as extract data, cleaning/transform it and upload. It needs huge engineering resources and time to ensure data correctness and consistency across the process. But this is equivalent to the benefit that will be obtained by stakeholders.

Conclusion

In the modern era, as it is today technology has evolved very quickly and there are many tools that can be used to migrate data. We’ll suggest you explore methods for moving data from PostgreSQL to BigQuery and making the right choices to suit your needs.

--

--

Arif Zainurrohman
Nerd For Tech

Corporate Data Analytics. Enthusiast in all things data, personal finance, and Fintech.