How I managed to delete 1 billions files from my Google Drive account.

Mukesh Sharma
3 min readFeb 25, 2018

--

I had a use-case where I used to store files on my google drive account as backup option. I used to upload about 10k-20k files per day to drive using (https://github.com/prasmussen/gdrive) client, and after every 3 months, I used to permanently (from Trash as well) delete parent folder containing those files. Since I had bought 1 TB subscription, I didn’t pay attention to space consumed and released after deleting folders.

After about 1 year, I stopped the paid subscription, but found out Google Drive shows 997 GB usage while there is not folder/file under “My Drive” and “Trash”. After going through various product support forums, I found out that people have faced similar issue, but there is no proper solution to this problem.

Some people suggested to check “Quota” section to find the files that are taking space. To my amaze, files dated February, 2017 were present there. I guessed, files were not deleted when I deleted the folder, probably because there were too many files ? Also people mentioned about the concept of orphaned files in Google Drive, files without parent folder. I found the link https://drive.google.com/drive/search?q=is:unorganized%20owner:me that can list orphaned files, but for my case, it showed “The server encountered an error. Please try again later.”.

Then, I explored Google Drive apis to list these orphaned files programmatically, and found https://www.googleapis.com/drive/v3/files that can accepts various q parameter to list the files with different search criteria. I tried hitting this api and was able to list all those files, and also delete those files using https://www.googleapis.com/drive/v3/files/fileId.

But number of files were too many, so I came up with the idea to pull list of files on one side and feed it to engine that can delete those files. I wrote node.js program that queries google drive apis, and write the batch of 1000 fileIds to a file, and another program that reads these fileIds one-by-one and hit delete apis.

and,

But, the problem with above solution was that it was painfully slow. It was deleting approximately 2 files per second. For 1 billion files, it was going to take approximately 15.85 years. I had to come up with different solution that can delete those files sooner. I tried to do parallel processing by running multiple instances of remover.js that will delete different batches of fileIds. I was able to scale it to 7–8 instances of remover using different access token by avoiding usage limit. Still it was supposed to take 1–2 years according to this speed :D.

After googling, I came to know that Google Drive support batch requests. I can bundle up to 100 requests in HTTP request and hit Google Drive servers. That way, I could avoid usage limit restriction, and could delete files at faster speed. I explored Google Drive library support for batch requests, and found python library supports it. It was my first time to explore python, and no wonder why people praise the simplicity of the language. I wrote sample python client that sends batch of 100 delete requests.

I was able to delete approximately 1 billion files sized 1 TB in 10 days. remove.py was able to delete around 100 GB of data per day. It was exciting to watch usage from 997 Gb to 0.16 GB. Google Drive batch requests are truly time saviour.

Thanks for reading. Kindly let me know, how can I improve.

--

--