GSoC 2021 with SCoRe Lab — Week 10

Published in

SCoRe Lab

3 min readAug 4, 2021

tl;dr — This is the eleventh article of my journey into the Google Summer of Code 2021 with SCoRe Lab. Here I discuss week ten (26th of July to 1st of August) of my GSoC experience.

So what happened this week?

In this week, my task was to create the endpoint to handle scans results file downloading. In our DNSTool system, a user can submit scans and the system will do scans for the provided resources. The results of the scans are stored in Google Cloud Storage. The data is stored in buckets and it is not accessible to the general public. So my task was to provide the user the ability to download their specific scan results without directly interacting with Google Cloud Storage. For this situation, as I mentioned in my previous blog post, I created a custom service account file where I generated a private key for the specific user and for their specific scan and stored the relevant public key in our Firebase RealTime Database.

For this task, in our DNSTool-CLI, the user can provide their service account JSON file and download their respective scan files. To support this feature, I had to implement the following two endpoints.

GET /list-downloads
GET /download/<path:path>

Suppose there are 5 files in the scan results in Google Cloud Storage, now the user provides their service account JSON file and they can download all the scan results. However, there is a huge problem with this method. According to the RFCs of HTTP implementation, it can respond with one and only one file per request.

Issue returning multiple downloads from one Flask route using MultipartEncoder

What I'm trying to do I'm building a simple single-route Flask app that takes a value from a single-field form, creates…

stackoverflow.com

Browser support of multipart responses

(A good option) A multipart response can be made manually! So one can write a no multipart response! Let's say in…

stackoverflow.com

The above mentioned Stackoverflow questions provide a comprehensive idea about why we need to zip all the files before sending the response. However, there is also a problem with zipping all the files into one file and sending a single file as the response, because we can’t make assumptions about the size of the scan results. If the scan results are few Kilobytes, then we can easily compress them in the memory and return the zipped file to the user. However, if the files are few Gigabytes large, then we can’t do this in the memory (for that we need a really good server 😁). So to resolve this situation, I implemented a middle route GET /list-downloads and now the CLI can see which files are required by the user and it can provide the file names in the GET /download/<path:path> route and easily download them. However, there is another issue I faced when implementing the GET /download/<path:path> route. Since this is directly involved with downloading an unknown size file from Google Cloud Storage and I had to return it to the user. Luckily, Google Cloud Storage Python documentation provides an easy method to download chunks of data from the Cloud Storage bucket.

google-resumable-media

As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions…

googleapis.dev

By using this method and Flask’s Streaming Contents API, I was able to implement the above route. Now the user can download a results file which can be small as few Kilobytes and large as few Gigabytes 😁.

So with these implementations, I submitted a PR,

Implement scan results file download API endpoint in DNSTool-Middleware-API[API-GATEWAY] by Niweera…

Add mock download request blueprint GET `/list-downloads` GET `/download` Add this suggestion to a batch that can be…

github.com

and it got merged into the main repository.

So in the coming two weeks, I will be working further on the DNSTool-Middleware-API[API-GATEWAY], and until we meet again, happy coding…