Data migration: How to Migrate data from Google Sheets to a PostgreSQL using Python.

Ayomide Omotosho
Data Epic
Published in
8 min readDec 7, 2023

Data migration is the process of transferring data from one data storage system to another and also between data formats and applications. It also involves data transfers between different data formats and applications. The data migration process also includes data preparation, extraction, and transformation. It is usually conducted when introducing new systems and processes in an organization. The efficiency of data migration has far-reaching implications for an organization.

Efficient data migration is crucial for organizations as it minimizes downtime, reduces data loss risk, ensures business continuity, and maintains data integrity for informed decision-making, while inefficient migration can lead to significant costs and data quality issues.

Introduction to Google Sheets and PostgreSQL

Google Sheets is a widely used online spreadsheet application, part of the Google Drive suite. It allows users to create, edit, and collaborate on spreadsheets in real-time. Its accessibility, user-friendly interface, and integration with other Google services make it a popular choice for data storage and preliminary analysis.

PostgreSQL, on the other hand, is an advanced, open-source relational database management system, known for its robustness, scalability, and compliance with SQL standards. It is capable of handling large volumes of data and complex queries, making it a preferred database for applications requiring reliable data storage and sophisticated data manipulation capabilities.

Why Migrate from Google Sheets to PostgreSQL?

Sheets are used to store tabular data in day-to-day work life. This data storage system can be used as a database for relatively smaller-scale projects, to collect and fetch data from users. But for various reasons like data consistency, organized and extensible data structure, and operational convenience, a simple database is not manageable for even slightly complicated data structures — a relational database is required in these cases. Other reasons include:

  • Performance Speed: The performance of GS as a database in fetching and rendering APIs is well at low volumes but the performance is affected when the dataset grows large.
  • Storage Limit: Google Sheets also has a storage limit of 5 million cells, which would seriously hamper the amount of data that can be ⁷ in a company’s database with ever-increasing data.
  • Limited functionality: Spreadsheet applications are not designed to be used like databases, so the functionality of Google Sheets is limited compared to more advanced database management systems. For example, you can’t save videos or images, or add rich field types and there’s no built-in interface to query, filter or update data.
  • Not scalable: Due to limited storage, Google Sheets is not scalable. Also, if many team members are working on the spreadsheet at the same time, the software will slow down, and you may encounter more errors.
  • Cannot see beyond a table: No other views like Kanban or Calendar are supported in Google Sheets.

These are few out of the many reasons why large-scale companies migrate their data from google sheet data storage system to other RDBMS like PostgreSQL.

Why Use Python for this Data Migration?

The use of Python for data migration, especially from Google Sheets to PostgreSQL, offers several compelling advantages:

  • Ease of Use and Readability: Python’s syntax is clear and concise, making it easy to write, read, and maintain code. This readability is particularly beneficial for complex data migration tasks, as it simplifies debugging and collaborative development.
  • Robust Libraries and Frameworks: Python provides a rich ecosystem of libraries and frameworks suited for data manipulation and database interaction. Libraries like gspread for Google Sheets, pandas for data processing, and psycopg2 for PostgreSQL interaction, streamline the migration process.
  • Flexibility and Scalability: Python’s flexibility allows it to handle various data types and structures encountered in Google Sheets and PostgreSQL. It can easily scale to accommodate large datasets, a critical factor in data migration projects.
  • Automation and Scripting: Python is excellent for writing scripts that automate repetitive tasks. In the context of data migration, it can automate data extraction, transformation, and loading processes, reducing manual effort and the potential for errors.

Establishing Connections Using Python.

Before data migration, you need to establish a connection with the google sheet API and PostgreSQL, and to do that you have to first setup an access for your google sheet and also create a database on Postgres, the two can be done following these steps;

  1. Accessing the Google sheet using Google Sheets API

To establish a connection with google sheet you need to have a GCP account which you can use to create a project, and, on the interface, you can navigate to google APIs and Services and enable the Google Sheets API.

After enabling the API, you can then create a service account in your Google Cloud project and download the JSON key file. This file contains credentials to authenticate your Python script with Google’s API, it looks something like this.

The next thing to do is to share the Google Sheets with the email associated with your service account and give editor right (not necessary). This step is necessary for the service account to access the sheets.

2. Creating a PostgreSQL Database and Table

If not already installed, download and install PostgreSQL from the link below:

PostgreSQL: Downloads

Follow the installation instructions for your operating system.

After completing the installation on your OS, navigate to PgAdmin on your PC and with the use of PgAdmin, you can create a new database by running a SQL script on the SQL editor or just create it manually on PgAdmin interface. For example, in SQL, you can create a database with:

When you are done setting up your database, based on the data structure in your Google Sheets, design a corresponding table schema in PostgreSQL. Determine the columns and their data types and if the data types are different, you can perform some sort of transformation to get the right data type before ingestion.

You can create a table in the query editor or on PgAdmin interface, to create the table using the editor you can simply execute the SQLstatement below:

You can replace your_table_name, column1, datatype1, etc., with your specific table and column names and data types.

After setting up the access to the google sheet and creating a table in the database we can now write our python script in the following steps.

What does the python script look like??

1. Creating a Connection with the PostgreSQL Database in the Python Script

Before you start writing your script, ensure you have the necessary credentials to connect to your PostgreSQL database, including the host, database name, username, and password. If you haven’t already, install the pycorpg2library, which is a PostgreSQL adapter for Python, you can use pip to install this on your bash:

Write a Python Script to connect in your Python environment, write a script to establish a connection to the PostgreSQL database. For example:

Replace the parameters with your actual database credentials.

Execute a simple query to test the connection, such as retrieving data or the table schema.

2. Getting data from the worksheet using Gspread library

With your credentials, you can use the gspread Python library to connect to your Google Sheets document.

Install it using pip and authenticate using the obtained credentials from your google service account, Once authenticated, use gspread to open the desired spreadsheet by its name and fetch the required worksheet. You can then read the data into a Python list or a Pandas dataframe for easier manipulation. This can be done using the script below:

Replace name_of_spreadsheetwith the actual name of your Google Sheets file and replace worksheet_titlewith the title of the worksheet you want to extract data from. The get_all_recordsmethod is a function of the gspread library that retrieves all the data from the worksheet and returns it as a list of dictionaries, where each dictionary represents a row in the sheet. In a more professional setting, it is advisable to have things like service_account_file as an environment variable.

For more info on gspread library you can access the documentation through this link:

Examples of gspread Usage — gspread 5.10.0 documentation

3. Ingesting the data into Postgres Database

After performing every necessary transformation, you want to do on the data, you can simply ingest by running this python script into the table in your PostgreSQL database.

Once executed, the script will efficiently insert the transformed data into your PostgreSQL database, making it available for further analysis, reporting, and utilization within your data ecosystem. This streamlined process ensures that your valuable data is seamlessly integrated into your database, enabling you to derive insights and make informed decisions based on the transformed and enriched dataset.

Conclusion

In conclusion, Efficient data migration from Google Sheets to PostgreSQL is a critical aspect of modern data management, particularly when dealing with larger datasets or when advanced data manipulation capabilities are required.

By following the established steps for establishing connections, authenticating with Google Sheets, and PostgreSQL, and employing Python scripts for data migration, organizations can ensure a seamless transition of data, maintain data integrity, and enable data-driven decision-making.

References

Data Migration — Types, Definition, Process, Practices (corporatefinanceinstitute.com)

Google Sheets as a Database: Pros & Cons (stackby.com)

--

--