Faster CSV downloads using Enumerator
At Reflektive, our customers download different kinds of reports such as getting a dump of all the feedbacks posted during a review cycle,
all responses to a certain poll that was conducted last week, etc. to help with their own data processing.
A typical request for a report involves the following steps:
- Execute a SQL query to fetch all the data that the report requires
2. Generate a CSV using the fetched data in the format the customer requires
3. Send the CSV in the response.
However, this approach does not scale well with large amounts of data.
The time taken for every step in this process is directly proportional to the size of the data that the customer is requesting for.
In terms of time complexity, that is O(n).
To reduce this time, we need to come up with one or more strategies, such as de-normalizing the data to speed up the first step, etc.
But perhaps it’s not entirely about how absolutely fast your application is, and more about how fast it feels to your customers.
Better and faster, thanks to Rack!
We do not have to download the full movie first to start watching it thanks to the concept of ‘media streaming’. In simple terms, we can download chunks of the movie in sequence while we’re watching the downloaded chunks.
Similarly, we can stream the CSV to the customer, rather than make them wait for the complete CSV to be generated.
This is possible because Rack, the middleware that Rails uses, requires the response object to support the #each method. Rack uses #each to send the response back in chunks.
In the current scenario, the CSV file that’s generated (Refer to Step#2 in the process) is a String. String does not support #each, and Rack is forced to send all of it in one shot, after it’s ready.
So, Rack requires the response object to support #each and ruby’s Enumerator fit this requirement.
Here’s a fibonacci Enumerator :
In a similar fashion, we can rewrite the method in Step 2 to use the Enumerator:
This does not produce the entire CSV string in one go, but only writes one row at a time, when Rack iterates on the Enumerator using #each.
This requires some changes in Step 3 too, to prepare the response body for the CSV Enumerator.
With these changes, any request for downloading a CSV will instantly start without waiting for the whole CSV file to be generated, thus drastically improving the user experience.
The code of the sample app shown in the above video is available here for reference.