Pandas : write a Pandas DataFrame to Google Cloud Storage in Python
One of the most popular tools for managing data is Pandas
data is an essential component of any business or organization. Companies are producing and consuming more data than ever before, and managing this data is becoming increasingly complex. One of the most popular tools for managing data is Pandas, a Python library that provides powerful data manipulation and analysis capabilities.
In this blog post, we’ll explore how to write a Pandas DataFrame to Google Cloud Storage in Python. Google Cloud Storage is a popular cloud-based storage solution that provides a simple and scalable way to store and retrieve data in the cloud. By leveraging the power of Google Cloud Storage, we can easily store and manage large datasets in a cost-effective and scalable manner.
Prerequisites
To follow along with this guide, please make sure to have:
- created a service account and downloaded the private key (JSON file)
- installed the Python client library:
To get started, we’ll need to install the google-cloud-storage
package. This can be done using pip:
pip install google-cloud-storage
Once we have the package installed, we can create a client to connect to our Google Cloud Storage bucket. We can do this by setting up a service account key with the proper permissions and passing it to the client constructor
Writing Pandas DataFrame to Google Cloud Storage as a CSV file
Consider the following Pandas DataFrame:
import pandas as pd
df = pd.DataFrame({'A':[3,4],'B':[5,6]})
df.head()
Case when you already have a bucket
To write this Pandas DataFrame to Google Cloud Storage (GCS) as a CSV file, use the blob’s upload_from_string(~)
method:
from google.cloud import storage
path_to_private_key = './gcs-project-354207-099ef6796af6.json'
client = storage.Client.from_service_account_json(json_credentials_path=path_to_private_key)
# The bucket on GCS in which to write the CSV file
bucket = client.bucket('test-bucket-skytowner')
# The name assigned to the CSV file on GCS
blob = bucket.blob('my_data.csv')
blob.upload_from_string(df.to_csv(), 'text/csv')
Replace path/to/service-account-key.json
with the path to your service account key. and Replace my-bucket-name
with the name of your bucket.
Note the following:
- if the bucket with the specified name does not exist, then an error will be thrown
- the DataFrame’s
to_csv()
file converts the DataFrame into a string CSV:
df.to_csv()
After running this code, we can see that my_data.csv
has been written in our test-bucket-skytowner
bucket on the GCS web console.