Pagination, Batching & Restricting result data with Google Cloud Storage python client library

Pavan Kumar Kattamuri
Analytics Vidhya
Published in
2 min readSep 25, 2019

This post demonstrates some of the best practices while working with Cloud Storage Python client library. In detail, I will discuss how to send batch API requests, restrict the result data and how to paginate through the results.

Batching

Instead of performing multiple similar kind of API operations, try batching API calls, whenever applicable, to reduce the number of HTTP connections your client has to make to Cloud Storage, resulting in less operational overhead and better performance.

For example, if you need to delete a large number of files from a GCS bucket, this is how you can delete them one by one. The code below demonstrates usage of bucket.delete_blobs() method to delete a list of blobs which in turn uses delete_blob() to delete each individual blob.

The better option is to make a batch API call to delete them all at once, though the maximum number of API calls in a single request is limited to 1000. The below snippet demonstrates batch delete operation, deleting 1000 objects at once until there is no object left to delete.

When tested on deleting 10,000 objects, sequential delete took around 3238 seconds (~54 min), while the batch delete took only 121 seconds (~2 min) to do the same.

Pagination

Use nextPageToken in the fields parameter to paginate through the results. This is useful when you want to display a limited number of results and you want to divide the returned data and display it in multiple pages.

Partial Response

When you request for a list of blobs that match the prefix criteria, by default, the server sends back the full representation of the blob after processing requests. For better performance, you can ask the server to send only the fields you really need and get a partial response instead. The below code uses fields parameter to request only name and contentType of the matched objects.

Patch (Partial Update)

Avoid sending unnecessary data when modifying resources. To send updated data only for the specific fields that you’re changing, use the PATCH method. Use PUT if you want to clear the previously set data. It’s much safer to use patch for this reason. You only supply data for the fields you want to change; fields that you omit are not cleared. The below code sets the Content-Type metadata to image/png by sending a batch patch request.

--

--