Google Drive API with Python II

Akshay Sharma
The HumAIn Blog
Published in
3 min readJun 1, 2021

So this is the second part of my first blog about drive API, If you have not given that a read, I suggest giving it a try here.

In the first part we are set with drive API and working setup on our system, In this blog, we will mainly cover :

  1. How we can fetch more detailed data around files/ folder
  2. Revision history for the files
  3. Metadata extraction.

Then we will use this data precisely to look for some analytical representation through graphs to make it look cool!

About the API

One of the most useful features of Google Docs is collaborative writing where multiple people can edit the same document at the same time. Data about the changes made to a document are captured by Google, and some of this is available to use via the API. This can be useful for identifying who has contributed to a document, how much they contributed, and when.

Before we get into the details, it’s useful to know the limitations of the API. Our experiments indicate that when many users edit at the same time, we only get the name of the first editor active in that session. The others are not captured. This means we cannot expect very fine-grained revision data from documents that had many simultaneous edits. Similarly, when many edits happen quickly, these are combined by Google and we cannot see them individually via the API. So the temporal resolution of the data provided by the API is limited. We have not yet investigated the exact resolution. Finally, older edits are deleted by Google to save space. We do not know what their rules are for this.

To get started, let’s connect to our Google Doc and get a list of revisions. We’ll need the ID of the Doc, which we can see in the URL of the Doc. The `revisions` object is a list where each item contains metadata about a specific revision of the document. In this document, we have `length(revisions)` revisions. To get information about the meaning of each metadata field, take a look at https://developers.google.com/drive/api/v3/reference/revisions#resource

Export the contents of the Google Doc at a given revision

Now we can write a little custom function to export the contents of the Google Doc at a specific revision. We choose to export the doc as a plain text file for simplicity. We can use this function to contact the Google Drive API and get the content of each revision of the Google Doc.

And finally, we can convert the responses from the API into plain text, and tidy it a little bit, ready for some exploratory data analysis, etc.

For my private doc, I get

This is one way to collect data that already exist and then use the known toolset to use it for our own advantage, in this example we used the metadata which already exists for all files and folders for all users on the drive, we are only collecting the data using API and then using the information to understand more about our files.

Thanks for the Read!!

Until next time.

--

--