Parsing Excel files with Active Job

Walter Reyes
2 min readMar 6, 2017

--

This is a common case: users fill excel files to document data, they could be documenting inspection reports, surveys, calculating something, you name it. There is a wide variety of stuff you could be doing with excel. Then the user will upload this file to your system, so it gets stored, shared and ready to download. Sooner or later excel files will get into your system.

In my situation users were uploading excel files but nothing was being done with them… yet. Now it was time to parse those excel files, get the data into a database, analyze it and display some neat graphs.

The Approach

I needed to parse many excel files stored inAWS, and I needed to keep parsing newly added files as they are upload.

I used ActiveJob to write a job that parsed a single file, next I could enqueue all existing files and let Sidekiq do the hard work. Finally I could queue a new job each time a file is uploaded.

I used Roo to parse the excel spreadsheets, this gem was very straigthforward to use. And it was able to open each file from an url, which is great because I did’nt have to download them.

You can do the following with Roo:

roo_parser = Roo::Spreadsheet.open(excel_url), select the sheet you need and access cells roo_parser.sheet("SheetYouNeed").cell("A,4"). As you can see it is very straigthforward. You can read the documentation to see everything you can do.

The Job

The job that needed to be done was simple. Every excel file is linked to an ActiveRecord model, which stores the url file, name, etc. The ActiveJob would take the excel_report_model as a parameter and it will scrap it.

Let’s generate the job:

rails generate job excel_file_parser

The job should do the following:

  1. Open the excel file
  2. Scrap the data you need and store it in the database
  3. Close the excel file to free memory
  4. Mark the excel file as parsed for future reference.

The job looks like this:

Now we can queue all existing excel files:

This assumes that you already have Sidekiq up and running.

That would take care of parsing all the existing files, and for every new excel file we can add a create callback to queue the parser job.

And that’s it. Happy parsing!

Originally published at gist.github.com.

--

--