Working with CSVs in Ruby
A while back, I wrote a post concerning Working with Files in Ruby reviewing some fundamental methods to use when reading and writing text files in Ruby. Here I want to turn our attention to building out similar functionality for working with CSVs.
We will be using the films.csv found here. Go ahead and clone that down so you can play along. In your editor make sure to require "csv"
so that you can access Ruby’s CSV class.
Reading from CSV in Ruby
The Ruby Docs tend to be an excellent resource which are quite user friendly for new developers. However, I find the docs for CSV to be a little less straightforward. There are a number of ways to read a CSV in Ruby with options for structuring your output as works best for your use case.
Return as an array of arrays
If you’re looking to read a CSV’s data into a nested array I recommend using .foreach which would look like this:
films_info = []CSV.foreach("films.csv") do |row|
films_info << row
end
This reads in the “films.csv” file one line at a time, creating an array where each element is a value from the row in the CSV. Ultimately it returns a nested array with the first element being an array of headers and the following elements arrays of CSV row information. Which might look similar to this:
films_info=> [
["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"],
["1","Guardians of the Galaxy", "Action,Adventure,Sci-Fi", "A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe.", "James Gunn", "Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana", "2014", "121", "8.1", "757074", "333.13", "76"],
...
]
Return as CSV objects
If you’re looking to read a CSV’s data into an array of CSV Ruby objects I recommend using .foreach with theheaders: true
tag as well as header_converters: :symbol
tag which might look like this:
films_info = []
headers = nilCSV.foreach("films.csv", headers: true, header_converters: :symbol) do |row|
headers ||= row.headers
films_info << row
end
The headers: true
tag will make it so each row
is now a Ruby CSV object.
films_info=> [#<CSV::Row
rank:"1"
title:"Guardians of the Galaxy"
genre:"Action,Adventure,Sci-Fi"
description:"A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe."
director:"James Gunn"
actors:"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana"
year:"2014"
runtime_minutes:"121"
rating:"8.1"
votes:"757074"
revenue_millions:"333.13"
metascore:"76"
>,
...
]
You can interact with a CSV object very similarly to a hash so you might call row[:title]
to get the title of that particular entry. You can also override a value on the CSV object, such as if you need to sanitize the data row[:title] = row[:title].upcase
. And now, row would return row[:title] => GUARDIANS OF THE GALAXY.
Another key method available to you on the CSV object is the #headers
method such as row.headers
so you can pull out the headers for the CSV you are reading if you don’t know them already.
Return as an array of hashes
If you’re looking to read a CSV’s data into an array of hashes, it would look pretty similar to the above CSV object creation. I recommend using .foreach with the headers: true
tag as well as header_converters: :symbol
tag which might look like this:
films_info = []
headers = nilCSV.foreach("films.csv", headers: true, header_converters: :symbol) do |row|
headers ||= row.headers
films_info << row.to_h
end
The data coming out might look like:
films_info=> [{:rank=>"1",
:title=>"Guardians of the Galaxy",
:genre=>"Action,Adventure,Sci-Fi",
:description=>
"A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe.",
:director=>"James Gunn",
:actors=>"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana",
:year=>"2014",
:runtime_minutes=>"121",
:rating=>"8.1",
:votes=>"757074",
:revenue_millions=>"333.13",
:metascore=>"76"},
...
]
Each of the above options has its pros and cons, however I much prefer working with the CSV objects Ruby has made for us. They are clear to read and intuitive to interact with. I almost always want to use the CSV object option.
Writing to CSV in Ruby
Similar to reading from a CSV, writing to a CSV can be a bit intimidating if given just the Ruby Docs. Ruby Docs are usually so helpful and intuitive, but I have always found them to be a bit opaque and overwhelming on this topic.
Write from an array of arrays
This honestly seems to be the most common strategy used. However, every time I use it, there is concern for keeping the values in the same order as the headers. If something gets added in later, that might make more sense in the middle rather than the end, you have to go back and re-format everything and risk botching your data. You also will want to pay attention to whether your source data array has the first element as a headers row (which is likely the case if you’ve just read this data from a CSV) or if you need to define the headers yourself.
#films_info is an array of arrays
headers = ["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"]CSV.open("new_films.csv", "w") do |csv|
csv << headers
films_info.each do |movie|
csv << movie
end
end
This will create a new CSV type file with the name new_films.csv,
located in the same directory as your script. The "w"
tag sets the new file to have write permissions. From here you need to pass in the headers and then you can shovel in each following row as an array of strings.
Write from an array of CSV objects
As mentioned earlier, I really like working with CSV objects and writing to a CSV with them is no different!
If you already have CSV objects that might look like this:
# films_info is an array of CSV objects
headers = films_info.first.headers || ["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"]CSV.open("new_films.csv", "w") do |csv|
csv << headers
films_info.each do |movie|
csv << movie
end
end
If you need to make CSV objects out of pre-formatted data, such as an array of hashes, that might look more like this:
#films_info is an array of hashes
headers = films_info.first.keys || ["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"]CSV.open("new_films.csv", "w") do |csv|
csv << headers
films_info.each do |movie|
csv << CSV::Row.new(movie.keys, movie.values)
end
end
You will see that we don’t get away from the issue of making sure header order matches up with data order as the headers
passed in to CSV::Row.new
need to be in the same order as the values
passed in. However, since we pull the headers from calling .keys and then the values by calling .values, it is a little more reliable. I really like that when using a CSV object, I’m passing in a pairing of information, here is the header and here is the associated value.
Write from an array of hashes
Given the second CSV object above starts from a hash, you can certainly use that strategy when your source data comes to you as a hash. If you don’t want to translate the data over to a CSV object first then you can pass in movie.values directly!
#films_info is an array of hashes
headers = films_info.first.keys || ["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"]CSV.open("new_films.csv", "w") do |csv|
csv << headers
films_info.each do |movie|
csv << movie.values
end
end
Dynamically creating Headers
Ultimately you’ll be a lot safer when writing if you can dynamically create your list of headers.
If you are creating a CSV from a single ActiveRecord object type, you could also get your headers by calling, for example Movie.column_names
which would give you something like ["rank", "title", "genre",...]
to use as your header values.
If you are creating a CSV from an ActiveRecord Result Object, depending on your ActiveRecord version you might be able to call result.fields
, result.columns
, or result.column_names
to get a dynamic set of headers.
If you are creating a CSV from an array of Structs, you can call Struct.members
to get you an array of attribute names such as [“title", “gross_sales", “year",...]
to use as your dynamically created headers.
Reading and Writing CSVs in Ruby
The Ruby CSV docs have a lot of information in them, and sometimes having a bit of a cheat sheet to be able to quickly read or write from a CSV in Ruby is quite handy!
Reading
I usually use .foreach and headers: true
then iterate and do your thing!
Writing
I usually use .open with the flag "w"
, and prefer to use dynamically created headers no matter what data type I end up using so that my headers are in the same order as my values.
Happy CSVing!