Ruby on Rails — Importing Data from an Excel File
Importing data from an Excel spreadsheet doesn’t need to be difficult. While Rails doesn’t include a native utility to handle these file types, there are several gems that make it quite easy to read/write Excel spreadsheets.
In this post, we will create a rake task and use the roo
gem to read from an Excel file and import data into the database.
Setup
We’ll start by adding the roo gem to our project. If you’re using Bundler 1.15 or higher, you can use the bundle add
command.
bundle add roo
This will add roo to the Gemfile
and install it. Alternatively, you can update the Gemfile
and install it manually.
Add this line to your Gemfile
.
gem 'roo', '~> 2.8'
And install.
bundle install
Creating the Rake Task
Now that we’ve added roo, we can start working on the actual feature.
Generate the rake task.
rails g task import data
This will generate the file import.rake
in lib/tasks
where “import" is the namespace and “data" is the task name.
Now that our task is created, let’s update the description. We’ll also add a simple puts
command and ensure that it runs.
namespace :import do
desc "Import data from spreadsheet" # update this line
task data: :environment do
puts 'Importing Data' # add this line
end
end
Run this command in your console.
bundle exec rails import:data
You should see the text “Importing Data” print to the console.
Importing the Data
To keep it simple, I have a very basic user
schema. Where a user has two fields, name and email. I’ve added the spreadsheet file at lib/data.xlsx
which contains several rows, each representing a new user to be created.
As you can see, the first row of the spreadsheet represents the headers. Everything else is the actual data that we need. Let’s start implementing roo so we can map over this data and create the users.
First, we will require roo and open the spreadsheet within the task.
require 'roo'namespace :import do
desc "Import data from spreadsheet"
task data: :environment do
puts 'Importing Data' # add this line data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet endend
Next, lets grab the first row of the spreadsheet since we know this is the header row. We’ll use this later to create a hash when mapping over the rows with the data we need.
data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet
headers = data.row(1) # get header row
Now we can map over the spreadsheet rows and extract the user data. Here is what the code looks like.
data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet
headers = data.row(1) # get header rowdata.each_with_index do |row, idx|
next if idx == 0 # skip header # create hash from headers and cells
user_data = Hash[[headers, row].transpose] if User.exists?(email: user_data['email'])
puts "User with email '#{user_data['email']}' already exists"
next
end
user = User.new(user_data) puts "Saving User with email #{user.email}"
user.save!
end
Let’ walk through this.
We map over each row in the spreadsheet using data.each_with_index
.
If we’re on the first row (idx == 0
), we want to continue to the next iteration because this is the header row.
Next, we create a new hash that contains the data from the current row. We build this hash using the headers
array that we grabbed earlier and the current row
that we are on in the loop. So, what is this line doing?
Let’s start with Array#transpose. I’m not going to dive into the transpose method but it essentially turns columns into rows when you have a multi-dimensional array. Here’s a quick visual.
This is what the array looks like before calling transpose
.
[
['name', 'email'],
['john', 'john@test.com']
]
And this is what is returned when calling transpose
.
[
['name', 'john'],
['email', 'john@test.com']
]
When passing the above array to Hash[]
, we get a back a new hash where the key
is the first item in each nested array and its value is the second. In our case, the hash matches the user fields we need to create a new database entry.
user_data = Hash[[["name", "John"], ["email", "john@test.com"]]]
p user_data # {"name"=>"John", "email"=>"john@test.com"}
Then we check to see if a user already exists with the current email address. If it does, we print some text to the console and move on to the next iteration without saving.
# next if user exists
if User.exists?(email: user_data['email'])
puts "User with email #{user_data['email']} already exists"
next
end
And finally, if the user doesn’t already exist, we can create a new user instance with the fields generated from the current row and save the new user in the database.
user = User.new(user_data)puts "Saving User with email '#{user.email}'"
user.save!
The complete code should look something like this.
require 'roo'namespace :import do
desc "Import data from spreadsheet"
task data: :environment do
puts 'Importing Data' data = Roo::Spreadsheet.open('lib/data.xlsx') # open spreadsheet
headers = data.row(1) # get header row data.each_with_index do |row, idx|
next if idx == 0 # skip header row # create hash from headers and cells
user_data = Hash[[headers, row].transpose] # next if user exists
if User.exists?(email: user_data['email'])
puts "User with email #{user_data['email']} already exists"
next
end
user = User.new(user_data) puts "Saving User with email '#{user.email}'"
user.save!
end
end
end
That’s it! Now you can run this task and import users from the spreadsheet.
While this approach uses a very simple data structure, it’s a good starting point even for more complex situations.
Contributor’s Bio
Gabriel is a software and summertime enthusiast. He is an Associate Engineer at Echobind.