Faking Your Way to a Data Set

As every beginner or seasoned developer will tell you, naming is hard. Naming classes, naming methods, naming attributes. Hard. All of it. However, do you know what’s even HARDER than writing a 1–2 word descriptor of something that represents a simple and concise concept or behavior? Creating an entire database of words or phrases that mimic real world data for development purposes. Although it can be oh-so-much fun spending hours or days hard coding out all of your data, and maybe throwing in an Easter Egg or two for an unsuspecting colleague or other curious developer, it can be a brutal task. Fortunately for us, there is a simpler way: the Faker Gem.

For those that are not fluent in Ruby, a Ruby Gem is a software package that contains a Ruby application or a library to extend functionality of Ruby applications. Many are open source and available to the global community of Rubyists.

Faker Gem can increase production by giving us a quick and easy method for generating nearly any kind of data imaginable for our application. Email Addresses, Countries, credit card numbers, Avatar images, Michael Scott quotes — you name it.

I’ll teach you about this specific Gem through example, using Rails.

Oh, no. We need a name for our example. Hmm. Give me a minute. Ok, let’s say we’re creating a database of our Neighbors. For each neighbor, we want to be able to include important information, such as their name, a phone number, an email, a job (you never know when you need a free AC consultation), and their favorite beer (in case they ever watch your dog or fix your AC).

Let’s begin in our console and create a new Rails app.

I’m going to user PostgreSQL here due to the current tools at my disposal to show you my seeded database a little later. Once Rails has done it’s magic, let’s open up Gemfile.rb, and within our Development gems, let’s add our new friend ‘faker.’

Let’s bundle all these fancy gems

and then create and run our migration. (You may need to drop your database if you forgot about that other table of neighbors you have in there).

We should have a schema (/db/schema.rb) that looks something like this:

Nothing too surprising yet, I assume. To prevent an error we’ll be sure to encounter when implementing the next step, let’s build our Neighbor Model (/app/models/neighbor.rb). We can leave it empty for the sake of this example.

IT’S THE TIME YOU’VE ALL BEEN READING FOR. Let’s create our seed data using Faker. As soon as you begin to browse the Faker Repository, you’ll notice the abundance of options to choose from. If you think it’s overwhelming, you’re correct. Here’s an example of the Dog Module. (Dang, I should have added a column for the neighbor’s dog. Next time)

This step may be as difficult as our naming quandary, but it’s worth the time spent [read: wasted] looking through and playing around with them. If you are seeding some data for your conservative grandparents, I would probably avoid using Faker::SiliconValley.quote.

Let’s navigate to our seed file /db/seeds.rb and create 10 rows of data for our development db. For our example, I have done the heavy lifting and have already found a set of Fakers that fits our intended data set PERFECTLY.

  • Faker::FamilyGuy.character for the ‘name’ attribute
  • Faker::PhoneNumber.cell_phone for the ‘phone’ attribute
  • Faker::Internet.free_email for the ‘email’ attribute
  • Faker::Job.title for the ‘job’ attribute
  • Faker::Beer.name for the ‘favorite_beer’ attribute

Here’s the syntax:

Let’s seed our database and see what we get!

(Here, I’ll use Postico to view my database)

Look at that! Instantly we have 50 cells of data, randomly generated, that are a little easier to look at than “Name_1”, “Name_2”, “Phone_1”, “Phone_2”, etc. It’s likely to be much more entertaining, too (bonus points).

You may have noticed a problem we have here: duplicate data. Dang those Pewterschmidts! While some Faker files have dozens or more options, and although it is inherently randomized, it is not necessarily going to produce unique data. Some databases will require unique inputs. You could do what I did when my younger, naive self first used Faker, and, well, just ignore it. OR you could utilize the #unique method, to give us a unique data set (or as unique as possible for that Faker).

You know the drill.


That’s better. (Note the Evil Monkey. Isn’t this fantastic?)

Explore the various Faker modules, and consider where else they could be used. Use Faker Gem in conjunction with the flavorless Factory Bot Gem for a more interesting testing experience! Give this a read for more information.

Faker is an efficient, clean, and extremely entertaining gem to solving the age-old dilemma of creating that boring development database. Play around with it next time you’re in need of extensive test data, and maybe even consider contributing to it (How isn’t there an Anchorman quote Faker? *Hint hint*).

If you missed it above, here’s a link to the entire repo of Faker modules. Enjoy!