Working with CSVs in Ruby

A while back, I wrote a post concerning Working with Files in Ruby reviewing some fundamental methods to use when reading and writing text files in Ruby. Here I want to turn our attention to building out similar functionality for working with CSVs.

We will be using the films.csv found here. Go ahead and clone that down so you can play along. In your editor make sure to require "csv" so that you can access Ruby’s CSV class.

You too can be a Guardian of the CSVs img credit

Reading from CSV in Ruby

The Ruby Docs tend to be an excellent resource which are quite user friendly for new developers. However, I find the docs for CSV to be a little less straightforward. There are a number of ways to read a CSV in Ruby with options for structuring your output as works best for your use case.

Return as an array of arrays

If you’re looking to read a CSV’s data into a nested array I recommend using .foreach which would look like this:

films_info = []CSV.foreach("films.csv") do |row|
films_info << row
end

This reads in the “films.csv” file one line at a time, creating an array where each element is a value from the row in the CSV. Ultimately it returns a nested array with the first element being an array of headers and the following elements arrays of CSV row information. Which might look similar to this:

films_info=> [
["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"],
["1","Guardians of the Galaxy", "Action,Adventure,Sci-Fi", "A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe.", "James Gunn", "Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana", "2014", "121", "8.1", "757074", "333.13", "76"],
...
]

Return as CSV objects

If you’re looking to read a CSV’s data into an array of CSV Ruby objects I recommend using .foreach with theheaders: true tag as well as header_converters: :symbol tag which might look like this:

films_info = []
headers = nil
CSV.foreach("films.csv", headers: true, header_converters: :symbol) do |row|
headers ||= row.headers
films_info << row
end

The headers: true tag will make it so each row is now a Ruby CSV object.

films_info=> [#<CSV::Row 
rank:"1"
title:"Guardians of the Galaxy"
genre:"Action,Adventure,Sci-Fi"
description:"A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe."
director:"James Gunn"
actors:"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana"
year:"2014"
runtime_minutes:"121"
rating:"8.1"
votes:"757074"
revenue_millions:"333.13"
metascore:"76"
>,
...
]

You can interact with a CSV object very similarly to a hash so you might call row[:title] to get the title of that particular entry. You can also override a value on the CSV object, such as if you need to sanitize the data row[:title] = row[:title].upcase. And now, row would return row[:title] => GUARDIANS OF THE GALAXY.Another key method available to you on the CSV object is the #headers method such as row.headers so you can pull out the headers for the CSV you are reading if you don’t know them already.

Return as an array of hashes

If you’re looking to read a CSV’s data into an array of hashes, it would look pretty similar to the above CSV object creation. I recommend using .foreach with the headers: true tag as well as header_converters: :symbol tag which might look like this:

films_info = []
headers = nil
CSV.foreach("films.csv", headers: true, header_converters: :symbol) do |row|
headers ||= row.headers
films_info << row.to_h
end

The data coming out might look like:

films_info=> [{:rank=>"1",
:title=>"Guardians of the Galaxy",
:genre=>"Action,Adventure,Sci-Fi",
:description=>
"A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe.",
:director=>"James Gunn",
:actors=>"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana",
:year=>"2014",
:runtime_minutes=>"121",
:rating=>"8.1",
:votes=>"757074",
:revenue_millions=>"333.13",
:metascore=>"76"},
...
]

Each of the above options has its pros and cons, however I much prefer working with the CSV objects Ruby has made for us. They are clear to read and intuitive to interact with. I almost always want to use the CSV object option.

Writing to CSV in Ruby

Similar to reading from a CSV, writing to a CSV can be a bit intimidating if given just the Ruby Docs. Ruby Docs are usually so helpful and intuitive, but I have always found them to be a bit opaque and overwhelming on this topic.

Write from an array of arrays

This honestly seems to be the most common strategy used. However, every time I use it, there is concern for keeping the values in the same order as the headers. If something gets added in later, that might make more sense in the middle rather than the end, you have to go back and re-format everything and risk botching your data. You also will want to pay attention to whether your source data array has the first element as a headers row (which is likely the case if you’ve just read this data from a CSV) or if you need to define the headers yourself.

#films_info is an array of arrays
headers = ["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"]
CSV.open("new_films.csv", "w") do |csv|
csv << headers
films_info.each do |movie|
csv << movie
end
end

This will create a new CSV type file with the name new_films.csv, located in the same directory as your script. The "w"tag sets the new file to have write permissions. From here you need to pass in the headers and then you can shovel in each following row as an array of strings.

Write from an array of CSV objects

As mentioned earlier, I really like working with CSV objects and writing to a CSV with them is no different!

If you already have CSV objects that might look like this:

# films_info is an array of CSV objects 
headers = films_info.first.headers || ["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"]
CSV.open("new_films.csv", "w") do |csv|
csv << headers
films_info.each do |movie|
csv << movie
end
end

If you need to make CSV objects out of pre-formatted data, such as an array of hashes, that might look more like this:

#films_info is an array of hashes
headers = films_info.first.keys || ["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"]
CSV.open("new_films.csv", "w") do |csv|
csv << headers
films_info.each do |movie|
csv << CSV::Row.new(movie.keys, movie.values)
end
end

You will see that we don’t get away from the issue of making sure header order matches up with data order as the headers passed in to CSV::Row.new need to be in the same order as the values passed in. However, since we pull the headers from calling .keys and then the values by calling .values, it is a little more reliable. I really like that when using a CSV object, I’m passing in a pairing of information, here is the header and here is the associated value.

Write from an array of hashes

Given the second CSV object above starts from a hash, you can certainly use that strategy when your source data comes to you as a hash. If you don’t want to translate the data over to a CSV object first then you can pass in movie.values directly!

#films_info is an array of hashes
headers = films_info.first.keys || ["Rank", "Title", "Genre", "Description", "Director", "Actors", "Year", "Runtime (Minutes)", "Rating", "Votes", "Revenue (Millions)", "Metascore"]
CSV.open("new_films.csv", "w") do |csv|
csv << headers
films_info.each do |movie|
csv << movie.values
end
end

Dynamically creating Headers

Ultimately you’ll be a lot safer when writing if you can dynamically create your list of headers.

If you are creating a CSV from a single ActiveRecord object type, you could also get your headers by calling, for example Movie.column_names which would give you something like ["rank", "title", "genre",...] to use as your header values.

If you are creating a CSV from an ActiveRecord Result Object, depending on your ActiveRecord version you might be able to call result.fields , result.columns , or result.column_names to get a dynamic set of headers.

If you are creating a CSV from an array of Structs, you can call Struct.members to get you an array of attribute names such as [“title", “gross_sales", “year",...] to use as your dynamically created headers.

Reading and Writing CSVs in Ruby

The Ruby CSV docs have a lot of information in them, and sometimes having a bit of a cheat sheet to be able to quickly read or write from a CSV in Ruby is quite handy!

Reading

I usually use .foreach and headers: true then iterate and do your thing!

Writing

I usually use .open with the flag "w", and prefer to use dynamically created headers no matter what data type I end up using so that my headers are in the same order as my values.

Happy CSVing!

Software Engineer at TaxJar

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store