Pandas : write a Pandas DataFrame to Google Cloud Storage in Python

Agusmahari
2 min readMar 16, 2023

One of the most popular tools for managing data is Pandas

Photo by Arnold Francisca on Unsplash

data is an essential component of any business or organization. Companies are producing and consuming more data than ever before, and managing this data is becoming increasingly complex. One of the most popular tools for managing data is Pandas, a Python library that provides powerful data manipulation and analysis capabilities.

In this blog post, we’ll explore how to write a Pandas DataFrame to Google Cloud Storage in Python. Google Cloud Storage is a popular cloud-based storage solution that provides a simple and scalable way to store and retrieve data in the cloud. By leveraging the power of Google Cloud Storage, we can easily store and manage large datasets in a cost-effective and scalable manner.

Prerequisites

To follow along with this guide, please make sure to have:

  • created a service account and downloaded the private key (JSON file)
  • installed the Python client library:

To get started, we’ll need to install the google-cloud-storage package. This can be done using pip:

pip install google-cloud-storage

Once we have the package installed, we can create a client to connect to our Google Cloud Storage bucket. We can do this by setting up a service account key with the proper permissions and passing it to the client constructor

Writing Pandas DataFrame to Google Cloud Storage as a CSV file

Consider the following Pandas DataFrame:

import pandas as pd
df = pd.DataFrame({'A':[3,4],'B':[5,6]})
df.head()

Case when you already have a bucket

To write this Pandas DataFrame to Google Cloud Storage (GCS) as a CSV file, use the blob’s upload_from_string(~) method:

from google.cloud import storage
path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)

# The bucket on GCS in which to write the CSV file
bucket = client.bucket('test-bucket-skytowner')
# The name assigned to the CSV file on GCS
blob = bucket.blob('my_data.csv')
blob.upload_from_string(df.to_csv(), 'text/csv')

Replace path/to/service-account-key.json with the path to your service account key. and Replace my-bucket-name with the name of your bucket.

Note the following:

  • if the bucket with the specified name does not exist, then an error will be thrown
  • the DataFrame’s to_csv() file converts the DataFrame into a string CSV:
df.to_csv()

After running this code, we can see that my_data.csv has been written in our test-bucket-skytowner bucket on the GCS web console.

--

--

Agusmahari

Data Enginner | Big Data Platform at PT. BANK NEGARA INDONESIA (Persero) Tbk. Let's connect on Linkedin https://www.linkedin.com/in/agus-mahari/