Ruby on Rails — Importing Data from an Excel File

Gabriel Martin
Echobind
Published in
4 min readJan 13, 2020
Photo by Rich Tervet on Unsplash

Importing data from an Excel spreadsheet doesn’t need to be difficult. While Rails doesn’t include a native utility to handle these file types, there are several gems that make it quite easy to read/write Excel spreadsheets.

In this post, we will create a rake task and use the roogem to read from an Excel file and import data into the database.

Setup

We’ll start by adding the roo gem to our project. If you’re using Bundler 1.15 or higher, you can use the bundle add command.

bundle add roo

This will add roo to the Gemfile and install it. Alternatively, you can update the Gemfile and install it manually.

Add this line to your Gemfile.

gem 'roo', '~> 2.8'

And install.

bundle install

Creating the Rake Task

Now that we’ve added roo, we can start working on the actual feature.

Generate the rake task.

rails g task import data

This will generate the file import.rakein lib/tasks where “import" is the namespace and “data" is the task name.

Now that our task is created, let’s update the description. We’ll also add a simple puts command and ensure that it runs.

namespace :import do
desc "Import data from spreadsheet" # update this line
task data: :environment do
puts 'Importing Data' # add this line
end
end

Run this command in your console.

bundle exec rails import:data

You should see the text “Importing Data” print to the console.

Importing the Data

To keep it simple, I have a very basic user schema. Where a user has two fields, name and email. I’ve added the spreadsheet file at lib/data.xlsx which contains several rows, each representing a new user to be created.

As you can see, the first row of the spreadsheet represents the headers. Everything else is the actual data that we need. Let’s start implementing roo so we can map over this data and create the users.

First, we will require roo and open the spreadsheet within the task.

require 'roo'namespace :import do
desc "Import data from spreadsheet"
task data: :environment do
puts 'Importing Data' # add this line
data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet endend

Next, lets grab the first row of the spreadsheet since we know this is the header row. We’ll use this later to create a hash when mapping over the rows with the data we need.

data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet
headers = data.row(1) # get header row

Now we can map over the spreadsheet rows and extract the user data. Here is what the code looks like.

data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet
headers = data.row(1) # get header row
data.each_with_index do |row, idx|
next if idx == 0 # skip header
# create hash from headers and cells
user_data = Hash[[headers, row].transpose]
if User.exists?(email: user_data['email'])
puts "User with email '#{user_data['email']}' already exists"
next
end

user = User.new(user_data)
puts "Saving User with email #{user.email}"
user.save!
end

Let’ walk through this.

We map over each row in the spreadsheet using data.each_with_index.

If we’re on the first row (idx == 0), we want to continue to the next iteration because this is the header row.

Next, we create a new hash that contains the data from the current row. We build this hash using the headers array that we grabbed earlier and the current row that we are on in the loop. So, what is this line doing?

Let’s start with Array#transpose. I’m not going to dive into the transpose method but it essentially turns columns into rows when you have a multi-dimensional array. Here’s a quick visual.

This is what the array looks like before calling transpose.

[
['name', 'email'],
['john', 'john@test.com']
]

And this is what is returned when calling transpose.

[
['name', 'john'],
['email', 'john@test.com']
]

When passing the above array to Hash[], we get a back a new hash where the key is the first item in each nested array and its value is the second. In our case, the hash matches the user fields we need to create a new database entry.

user_data = Hash[[["name", "John"], ["email", "john@test.com"]]]
p user_data # {"name"=>"John", "email"=>"john@test.com"}

Then we check to see if a user already exists with the current email address. If it does, we print some text to the console and move on to the next iteration without saving.

# next if user exists
if User.exists?(email: user_data['email'])
puts "User with email #{user_data['email']} already exists"
next
end

And finally, if the user doesn’t already exist, we can create a new user instance with the fields generated from the current row and save the new user in the database.

user = User.new(user_data)puts "Saving User with email '#{user.email}'"
user.save!

The complete code should look something like this.

require 'roo'namespace :import do
desc "Import data from spreadsheet"
task data: :environment do
puts 'Importing Data'
data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet
headers = data.row(1) # get header row
data.each_with_index do |row, idx|
next if idx == 0 # skip header row
# create hash from headers and cells
user_data = Hash[[headers, row].transpose]
# next if user exists
if User.exists?(email: user_data['email'])
puts "User with email #{user_data['email']} already exists"
next
end

user = User.new(user_data)
puts "Saving User with email '#{user.email}'"
user.save!
end
end
end

That’s it! Now you can run this task and import users from the spreadsheet.

While this approach uses a very simple data structure, it’s a good starting point even for more complex situations.

Contributor’s Bio

Gabriel is a software and summertime enthusiast. He is an Associate Engineer at Echobind.

--

--