Elixir: Export, Download, Zip, and Email

Jebin
Jebin
Sep 2, 2018 · 7 min read

Recently, one of my clients wanted a feature in their Elixir Phoenix application which should export the data in a .csv, and the associated images needs to be downloaded from an external store like S3, zipped and emailed to the given email.

It was one day effort to pull it all together. I’ll share the thought process and the code here.

Pre-requisite: Expected to know Elixir, Phoenix, GenServer, Tasks, Agents.

The feature is exposed via an API interface which is consumed by the client applications. A controller action was implemented which had basic parameter validation and then we get into the real business.

The whole operation might consume considerable amount of time. We cannot keep the API request waiting for all the time it takes to export the data, download the images, zip them and email it. So we have to create an approach where we return the response immediately with an “acceptance message” and do the whole operation like a job handled by a job manager asynchronously.

It is advised to use a Redis backed job manager like exq if you want a persistent state. In this example, I have taken an approach where the state is managed using Agent module.

The following is the code for the API.

We have AdminController which has export action. We get the data to be dumped as csv using the User.get_dump() which is a model function. The returned data will look like the following:

This data along with the email is fed to Export module which is a GenServer and the job manager. The following line of code is the entry point:

Example.Export.export_and_email(email)

The export_and_email is a client interface exposed by the Export module. Let’s look at the content of export_and_email function.

def export_email(data, email) do    GenServer.cast(__MODULE__, {:export_email, {data, email}})end

With GenServer you can do synchronous calls with call interface. Here we use cast because we don’t have to wait for the response. It is a fire and forget strategy. If you want to know the difference between call, cast or info, look into this post. The controller action calls export_email function to raise a job request to Export GenServer. Given that it calls cast, the function call returns immediately.

Let’s look at the handle_cast function which handles the above mentioned cast call.

I’ve created a module attribute @agent to identify the Agent process into which we are storing the state information. One might ask, why are we not using the GenServer’s state itself. We want to update the state at various helper function which I found to be cumbersome while using the state of the GenServer. So I used Agent module to store the state.

In the start_link function which is called by the application supervisor, we start the GenServer process, and the Agent process with name :export_email_state.

The handle_cast function does the actual job. We check if there is already a job running for the given email to avoid redundant jobs. So we check in the state of theAgent process for the given email which is the key to identify if this is a duplicate job request. If the job is running already, we’ll get a tuple like {:started, 1535715729295} and that is handled in the 2nd case of the case block.

The first block in the case block is where the real job is called. If the Agent state returns nil, means that there is no job running for the given email ID. So we have to start a fresh job. Before starting, we create an entry in the Agent state to ensure we don’t add duplicate jobs. It is done with the following line of code:

Agent.update(@agent, fn agent_state -> Map.put(agent_state, key, {:started, System.system_time(:millisecond)}) end)

Then we call the export_download_zip_email function which internally calls each sub-task.

There are 4 sub-tasks here.

  1. Export the data to csv file
  2. Download images asynchronously in batch (Serially parallel)
  3. Zip the downloaded images
  4. Email the zip and the csv file

Note: All the snippets given below goes into Export module, just in case.

The export_download_zip_email function looks like the below code.

We create a temporary directory tmp inside the application directory and a directory with random string prefixed with users_ for each valid job. This is where we will create the csv file, and put the downloaded images, and the zip file. We used File.mkdir_p! to create the directories. In the end, we remove the job specific temporary directory.

Let’s see in detail each of the sub-task.

Export data to CSV file

We have to create a string of csv data and write to a file in the desired location.

That’s pretty straight forward. Iterate over the data and prepare a string of comma separated values with the desired values. Prepend the header and write to a file after joining the items of the list. The file should be written in the temporary path that we had created. That will be helpful for emailing the file. The function finally returns the file path.

Download images

Out of all 4 tasks, this is the most complex task. We have to download as many images according to the data which could be 1, 10, 100 or more. Note: This approach might not be good for large number of image downloads. It would be great to avoid such a feature itself if the number of images to be downloaded goes beyond certain limit. Instead give the client the ability to browse the images via the application. Back to the code, the following gist is all the code needed to download images.

Let’s understand the code. We create a list of maps with url and name. The name will be useful to store the image with the name of the user itself which makes it readable. If the list is not empty, we create a directory called images inside the temporary directory of the job. This is where we will download all the images for the given job.

Bit of explanation is needed to understand what happens next. Image downloads are resource and time consuming tasks. Let’s say there are 100 images and we simply start downloading the images one after the other in serial/blocking fashion, the amount of time it will take to complete the 100 will be pretty huge. Erlang is a language meant for concurrent processing and we are not taking advantage of its concurrency which is bad. Let’s say, if we download of all 100 at once in parallel, it will be a disaster as it will exhaust the system resources. The best way to do it is batch processing.

We split the list into batches of 5 at a time and start downloading 5 at a time in parallel. So we are executing each batch one after the other but all the items in the batch are executed in parallel. This way we don’t exhaust the system resources. In Elixir, we have Stream to help us. Why not use Enum to chunk the list and execute the batch? Enums are eager loaded while Streams are lazy loaded. That gives us the advantage to execute each chunk lazily only after one chunk/batch is completed. If we use Enum, all the chucks will be fired in parallel which defeats the whole purpose of the batching process.

Now if we revisit the function download_images_in_batch , we can see the batches are produced by Stream.chunk_every which is a stream and is executed byStream.each. Stream.each lazily calls the given function which in our case is an anonymous function which in turn calls the download_images function with the current batch of 5.

In download_images function, for each item in the batch, we create an async task using Task.async for each image download with download_image so that all the 5 downloads can happen in parallel. The download_image function uses HTTPotion to download the image. We don’t use async download provided by HTTPotion as we are using Task.async.

We await for the tasks using Task.await and once the tasks of this batch is done, the downloaded response is written to the given directory. As per the code, download_image returns a tuple {:ok, name<>”.”<>get_image_extension(url), body} having the response and the name for the file with extension. We use these information while writing the response to the path.

That concludes the batch download task.

Create ZIP file

Once all the images are downloaded, we have to create a zip file out of that directory. There is no straight forward Elixir way to do this. So we use erlang interfaces to create a zip.

In the above snippet, you can see we change directories before and after creating zip. The zip creation interface is not that flexible. It doesn’t take a source directory and create a zip in a given destination directory. We have to change the current working directory to the directory into which the zip file should be created. So File.cd! into that directory and using :zip.create which takes 3 params: the name of the zip file to be created, the directory name under which the files should be put inside the zip file, and the path under with the directory mentioned in the second param is present. After the zip creation, we go back to the old working directory and return the zip path.

Sending Email with attachment

After the creation of the zip file, we have to send email attaching the csv and the zip file. We used Bamboo elixir package to send email.

You have to be in latest Bamboo version to send attachment. Older versions didn’t support sending attachment. Here we compose an email with attachments using the paths and call the Mailer.delivery_later function. We have to choose an adapter to use Bamboo which takes different configs based on the adapter. You can read about them in Bamboo’ documentation.

If the attachment size is bigger, the emails might not be delivered. Example, Sendgrid allows only 20MB size for the whole email including the header and body and attachment. So we have to use this wisely.

That is all. We have completed the task to export a csv, download images in batch, create a zip file out of the images, and email them.

Man! That is a long post. Hope it is useful for someone solving similar issue.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade