“Drive Magic”

Balázs Németh, Senior software developer @Doctusoft

Google Drive. You can’t really exist on the web without ever encountering it. You are most likely an avid user if you are reading this. Obviously from users’ point-of-view it is a great product. With the initial release that happened over 10 years ago — when the currently widely used Drive moniker was nowhere to be found as it was Docs, Sheets and Slides — it made online collaboration widespread.

Due to the tight integration between the various G Suite products nowadays the need for enterprise solutions that uses — or actually build upon — it increased a lot. I was lucky — well that is questionable, and you will see later why ;) — enough to work with it for a rather long time. Well, long time seems a bit over exaggerated given that it’s been only 5 years a few months ago, but due to the age of the whole product I still feel like a Methuselah. Looking back we created solutions I never imagined we will, but it came with our fair share of mindf*ck along the way.

I think Google was surprised as well how popular it became in just a few years. Sure, it certainly wasn’t the view it projected for the users and what you could see in the marketing bs — sorry material — for Google Apps — maiden name of G Suite -, but if you have ever tried to integrate with it you could certainly see the drawbacks, and the indications that this wasn’t planned as an enterprise product with a scale like this. After I just wrote this sentence I already heard many of you cry out about how unjust I am with Drive — given that neither of the competitors are better. Well, I don’t think so I am. I just meant enterprise level integration with a wide featureset and huge throughput, and preferably projects that has started years ago as the issues I’m about to describe resemble the rain forests. They tend to decrease over time. At least in this case it’s good news :).

Although I already worked with some older services (like Documents List API, Spreadsheets API), and some newer ones (like Drive API v3, Sheets API v4). Most of my experience comes from Drive API v2… and well let’s just forget the GData era. It’s better for everybody :)

I thought about ordering and prioritizing the issues, but I couldn’t come up with a decent enough rule for it. Basically every issue could easily be just as crucial to an app as irrelevant it is to another one so I go as it feels right. Also I’m sure it would have been heavily biased already, as I do remember every single hour that it caused me thinking about my career choices :D So excuse me in advance if I might ramble — or even rant -, but that is how I “cope”… it’s cheaper than drinking ;). … and please consider that many experience I describe here happened before it was documented and/or fixed properly.

One of the probably most often needed feature that is missing is that you can’t search in a folder hierarchy through the API. You can search globally, or in a specific folder explicitly, but not inside subfolders. Although it’s present on the Drive UI — it has been recently released -, there is no API method for it. It’s also most likely caused by the original underlying architecture. Back in the days the documents had labels instead of folders and weren’t actually represented as being in folders as they are today. It was similar how Gmail still represents labels. When it was rebranded some functionality were added, but the core concept is pretty much the same.

If you check your network activity while using the Drive UI you can spot many /drive/v2internal/ calls. That isn’t the published v2 as it contains more functionality. Unfortunately many of them won’t get promoted to the public API — or at least very slowly.

For example:

If a file is shared with the domain, but the user has never opened it — which is expected for new accounts and/or new files for example -, then it won’t show up on the default result list when you search for it on Drive. You have to modify the “Location” from “Anywhere” to “Visible to anyone in yourdomain.com”. I can’t imagine a scenario where this makes sense business-wise — that it’s not included in default — apart from the possibility that it’s a direct effect of an architecture that hasn’t been originally built for this.

It’s a rather frequent use-case to subscribe to aka “watch” change notifications (https://developers.google.com/drive/v2/reference/changes/watch). We certainly do it a lot. It’s also common to further modify the file you received notifications about. Going on with our example imagine that you don’t always check the current state on Drive, or the content of the patch you just built — if it’s empty or not — you just execute the request. It happens. Developers are lazy. So what you did was essentially a noop change, yet we encountered multiple cases when requests like this triggered a new change notification on the very same file. Which we processed. Again. Executed a noop request. Again. I think you can see where this is going :) This issue had an ever trickier occurrence. We wanted to actually change something, but we couldn’t. For the sake of the example imagine we try modify the title and the parents in the same request, but we don’t have edit access on the folder we try to move the file to anymore. The request fails, which is actually expected. What wasn’t that it triggered a change notification, that ensured we tried to do this again :). It required storing that we have already failed in an unsolvable way to avoid looping.

There is no such a method on the Drive API as getEmailForId, and even getIdForEmail hasn’t been added to v3 and only present in v2 (https://developers.google.com/drive/v2/reference/permissions/getIdForEmail). There must be some kind of reasoning behind this decision, but I can’t see it. It must be that they consider the id->email mapping as something that is a need-to-know, but how can that knowledge be exploited? It could be used to list emails for spamming, but given the length of ids there are easy ways to avoid that without completely getting rid of a useful method. Meanwhile this could make debugging much more complicated for example when an accounts gets modified to have a new email address.

That’s it for today, but don’t you worry. *I’ll be back*… with rate limiting, batching and so on.


Originally published at medium.com on January 30, 2018.