Ruby — Processing CSV with Metaprogramming
It’s been 2 years since the first time I made a contact with Ruby programming language. I have to say that it’s a beautiful 2 years of relationship, Ruby is just like a beautiful spouse with all the useful skills to handle everything we need to keep us together (sounds so geek, huh?).
One thing that I love so much about Ruby is it’s capability to do something called Metaprogramming. I learned this technique from a book with title ‘Ruby Metaprogramming` written by Paolo Perotta. This book influence me nothing but to love ruby even more, and make my style of programming (subjectively said) even better.
In this Article I want to show you an example of this metaprogramming I mentioned above that I already implement to my code to handle CSV processing. I found it very useful and I hope that I can share what I learned to another beginner Rubyist who seeks the most of their Ruby.
Oh, before we start, I want to claim that this article will not explain about what metaprogramming is, that is out of scope of it. I will explain some of it when I feel I need to but it will not in details. But if you are not familiar with metaprogramming, don’t worry. I hope this article will motivate you to know more about it.
So here is the case, my uncle have an online store selling various stuffs and now he decided to promote his online store to get more traffic by using several digital marketing vendor service. He asks for my help to handle it.
Each of digital marketing vendor, of course, need all my uncle’s product database to do dynamic marketing. My uncle said that he can serve daily data via CSV file that I can fetch every day. The content of the file will be looked like below:
|id |name |price |discount|cat_1 |cat_2 |img_url|url|
|299|A doll |300000 | 30000|hobbies|toys |http:/.|...|
|390|A phone|2500000 | 0|phone |smartphone|http:/.|...|
|499|A ball |450000 | 0|sport |football |http:/.|...|
I don’t write all the data completely especially in
url column but I’m sure you get the point.
Unfortunately, each digital marketing vendor have their own requirement to receive the CSV file so they can process it to do dynamic marketing. Say that we use two digital marketing vendor named Bolton and Stark.
Bolton ask us to serve the data every day in the following column:
deal_price column will contain the price after discount, and
category_tree will contain category tree of product separated by
> sign from the most parent to it’s child, for example for the first record of our product data above, the category_tree value will be
hobbies > toys .
And Stark ask us to serve the data every day in the following columns:
Now my task is to convert the CSV given to me from my uncle to the new CSV file for each of marketing vendor.
Without knowing about Ruby metaprogramming, my solution will probably be like this:
Let’s call new generated csv file as ‘feed’.
In the code above, I defined each generator to it’s own class, and manually mapped all column from source csv file to the new csv file as product feed. Pretty straightforward.
But, with some metaprogramming techniques that ruby provide, it can be done in another way. In a way that, well, more meta. Let’s see the code below as another alternative, don’t panic while you read it if you’re not familiar with metaprogramming, we will discuss what written here afterwards.
Alright, it becomes so different now. What happened here? What is that
instance_eval thing? What is that
define_singleton_method means? Calm down, we will take a look on it.
First, we defined a class called
FeedGenerator this class will be the parent for another Feed Generator class. Making instance of this class will require to pass two parameter:
feed_name . That is, the file name of source file and the file name of file that will be our new product feed.
FeedGenerator class has three attributes:
source attribute will be the place where we put our source file to be processed,
feed attribute will act as the new feed csv processor, and
current_row attribute will act as a ‘pointer’ for us to keep track of the current row while we iterate the csv.
FeedGenerator class, we defined an inner class called
Row . This class exist in a mission to generate dynamic method for csv row that we will iterate. The
current_row attribute in class
FeedGenerator will store an object of
Let’s talk about what happen in
initialize of class
FeedGenerator we assign an object of class
CSV to variable
sourcethat exist as an attribute of it. After that, we assign an object of inner class
Row to variable
current_row that also exist as an attribute.
Then, here is where the magic happen. This block of code:
# define method by header of source file
@source.headers.each do |column|
will define dynamic methods for
@current_row . In that block, we iterate all header column of our source file which happened to be:
id name price discount cat_1 cat_2 img_url url
and define a method for each of that to
@current_row by calling
@current_rowwill let us ‘reopen’ it’s class and allow us to define new special methods (or modifying an existing one) that exclusively available only for object stored in
@current_row.Here, we define methods called ‘id’, ‘name’ and so on so we’ll be able to call the methods like
@current_row.id that will return the value of
self.row['id'] (I hope it’s clear enough, pardon me XD)
After that block, we assign an object of class
@feed = CSV.open(feed_name, 'wb+')
@feed << columns
That’s about the initializer. Now let’s move on and talk about the method
@source.each do |row|
@current_row.row = row
@feed << columns.map do |c|
In method above, we will iterate variable
source that containing an object of class
CSV by calling method each and passing a block to it. Each row of the file will be iterated and assigned to variable
row. In that block, we will assign
@current_row attribute that we call also
row . As we recall that from initializer above, now variable
@current_row is having method ‘id’, ‘name’, and etc. In this block:
@feed << columns.map do |c|
we will append a new row to our new
CSV that we stored in variable
@feed . What happened here is for each of that row, we map
columns method (that will return an array of columns of our new csv) to new value that we get by calling
@current_row.send(c) . For you that is not yet familiar with metaprogramming in ruby, calling method
send of an object and passing a parameter to it means you calling a method with a same name with parameter that we pass. So this line of code below:
will do the same thing as this line:
Now you probably got an idea about how dynamic Ruby can be. So in the block that we talked about, we dynamically generating each row in our new csv product feed by calling method of
For completeness of explanation, every child class of
FeedGenerator has to define method
columns containing the columns of new
CSV file that will be generated.
Lastly, we generate the feed we need by create an instance of each child class of
FeedGenerator and calling it’s
So, what’s good about it? Why don’t we just do it in ‘conventional’ way without metaprogramming technique? Even in comparison of the two version of the code above, the code with conventional way has less line than the other one.
Well, I will said that the code with metaprogramming has a better flexibility than the conventional one. In this case, if you handle product feed for more than two marketing vendor, you will start to see the power of metaprogramming.
Say there is two more vendor called Lannister and Greyjoy that my uncle also want to use as a promoter to his online store. As usual, Lannister ask us to serve the product feed with following column:
and Greyjoy ask us to serve the product feed with following column:
In conventional way, our final code with be:
But in metaprogramming way, what we need to do is add two new class with method
column defined as below:
and add one more mapping method for column
title to class
def title; name; end
The final look of the code with metaprogramming will be:
Now we see that handling it with some metaprogramming give us a more compact code, and a fancy style ;-)
Side note: you will probably notice that there are still some part in the code looked duplicated and we have a way to remove it so it can be as DRY as it could. Well, it’s up to you to do it that way. But I hope you could catch my point about implementation of metaprogramming to handle csv.
Now we see that some metaprogramming technique can save us from several lines and gives us flexibility to adapt our code dynamically. In this article we saw how this technique can be implemented to help us serving various csv product feed from one source csv file.
However, metaprogramming is like a double-edged sword, or perhaps it is more suitable to say that it is like a magic. Don’t overuse it or your code will be hard to read. Just like magic, you have to be wise, and I believe as you learn to implement it in many ways, you will know when is the best time to use the magic.
Thanks for reading, I hope you found it useful, happy code.