Understanding What’s Happening “Under the Hood” with Source Codes and Existing Methods

Samuel Guo
The Startup
Published in
7 min readJan 13, 2020

Background

Seed data is an essential part of the development process. Seed data can be dummy data that your code uses to confirm your code is doing exactly what it is supposed to do. This is slightly different from tests because the data can be used throughout your codebase where necessary. Seed data goes along the thought process of behavioral driven development, where depending on the output of your seed data, you may need to tweak your code accordingly.

Although it is very straight-forward to create seed data, you may run into issues creating the seed data itself. One instance where I ran into this was creating seed data where the user has a secured password via BCrypt. BCrypt is a gem in Rails where it salts and hashes your password, making it indecipherable to the human eye. An example is if my password is 123, then the salted and hashed password may look like $2a$04$iOfhwahFymCs5weB3BNH/uXkTG65HR.qpW.bNhEjFP3ftli3o5DQC .

I wouldn’t worry about the concept of salting and hashing for the blog or the the BCrypt gem itself. You can think of salting and hashing as encryption for your password. Although I would be using these terms, the majority of the blog will be on password , password_digest , find_or_create_by, and has_secure_password .

Setup For Usernames and Passwords

To kick this off, below is the schema, User model, and seed data for this example.

// Schema
create_table "users", force: :cascade do |t|
t.string "username"
t.string "password_digest"
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
// User model
class User < ApplicationRecord
has_secure_password
end
//Seed data
for x in 1..5
User.find_or_create_by(username: "Seed Data #{x}", password_digest: "123#{x}")
end

In the schema, I would ignore created_at and updated_at as those attributes are automatically provided whenever a new table is generated on the command line.

In the model, the only thing we need here is has_secure_password . Directly from the documentation for has_secure_password , it

Adds methods to set and authenticate against a BCrypt password. This mechanism requires you to have a password_digest attribute.

In other words, has_secure_password encrypts the password the user inputs when it is created and an authentication method whenever the user inputs their decrypted password (i.e., the password they originally used). This is because the encrypted password, password_digest , is saved in the database.

In the seed data, we are creating five instances of a User by using a for loop and the method find_or_create_by . Personally, I like to use find_or_create_by because it will not create an instance of a model that has the same attributes. This will prevent any duplicated data within my seed data and also follows the notion that data should be not duplicated for users.

Now that we have everything set up, let’s start creating the seed data.

Creating Seed Data

Like any Rails application, we can run the seed file using rails db:seed in the command line. To double-check what we expected to get, let’s look at the seed in the console. This can be done by running rails c in the command line, and then typing User.all to get all of the instances of Users. This should all be our seed data. Below is what was outputted:

2.6.1 :001 > User.allUser Load (2.5ms)  SELECT  "users".* FROM "users" LIMIT ?  [["LIMIT", 11]]=> #<ActiveRecord::Relation [#<User id: 1, username: "Seed Data 1", password_digest: "1231", created_at: "2020-01-13 01:38:04", updated_at: "2020-01-13 01:38:04">, #<User id: 2, username: "Seed Data 2", password_digest: "1232", created_at: "2020-01-13 01:38:04", updated_at: "2020-01-13 01:38:04">, #<User id: 3, username: "Seed Data 3", password_digest: "1233", created_at: "2020-01-13 01:38:04", updated_at: "2020-01-13 01:38:04">, #<User id: 4, username: "Seed Data 4", password_digest: "1234", created_at: "2020-01-13 01:38:04", updated_at: "2020-01-13 01:38:04">, #<User id: 5, username: "Seed Data 5", password_digest: "1235", created_at: "2020-01-13 01:38:04", updated_at: "2020-01-13 01:38:04">]>2.6.1 :002 >

Hm, that seems strange. Although the users were created, it seems that each password was not encrypted. The seed data does not accurately reflect an actual user which means the seed data creation code needs to be tweaked somewhere.

Looking at the Source Code

Unfortunately in the documentation, it doesn’t directly address what happens in the case when the password is not encrypted at the point of creation. Below is a snippet of code from the example in the documentation.

# Schema: User(name:string, password_digest:string)
class User < ActiveRecord::Base
has_secure_password
end

user = User.new(name: 'david', password: '', password_confirmation: 'nomatch')

It seems when creating a new user, the attribute is supposed to be password in lieu of password_digest . But in the comment on the first line, the schema also has an attribute of password_digest . There seems to be a weird disconnect. Typically, the attributes in the creation of an instance of a model is reflected in the schema as well, yet in this case, it’s not. The documentation doesn’t explain this disconnect. However, the documentation does provide the source code for has_secure_password , so let’s open that up.

I won’t copy and paste the entire source code, but below are the parts relevant to creating our seed data.

# File activemodel/lib/active_model/secure_password.rb, line 42
def has_secure_password(options = {})
# Load bcrypt-ruby only when has_secure_password is used.
# This is to avoid ActiveModel (and by extension the entire framework)
# being dependent on a binary library.
begin
gem 'bcrypt-ruby', '~> 3.1.2'
require 'bcrypt'
rescue LoadError
$stderr.puts "You don't have bcrypt-ruby installed in your application. Please add it to your Gemfile and run bundle install"
raise
end

attr_reader :password

include InstanceMethodsOnActivation

...
end
end

module InstanceMethodsOnActivation
... # Encrypts the password into the +password_digest+ attribute, only if the
# new password is not blank.
#
# class User < ActiveRecord::Base
# has_secure_password validations: false
# end
#
# user = User.new
# user.password = nil
# user.password_digest # => nil
# user.password = 'mUc3m00RsqyRe'
# user.password_digest # => "$2a$10$4LEA7r4YmNHtvlAvHhsYAeZmk/xeUVtMTYqwIvYY76EW5GUqDiP4."
def password=(unencrypted_password)
unless unencrypted_password.blank?
@password = unencrypted_password
cost = ActiveModel::SecurePassword.min_cost ? BCrypt::Engine::MIN_COST : BCrypt::Engine.cost
self.password_digest = BCrypt::Password.create(unencrypted_password, cost: cost)
end
end

...
end
end

This may be a lot to take in at first but its actually not all that bad. If you understand Ruby methods and rescues, everything else is straightforward.

In the has_secure_password method, it starts with a rescue. This is to confirm that you have the gem bcrypt in your bundle and installed prior to moving forward. Next, it has a reader method to read the password attribute and uses the module InstanceMethodsOnActivation . This is where clarity on the disconnect comes in.

The password method has an equal sign after the method name, which I am unfamiliar with. After a quick Google search, the link below can explain this much better in a way that I can rephrase it.

When the password is being assigned (i.e., user.password = ), the password method is invoked. Although I do not know what is happening with the line cost = ActiveModel::SecurePassword.min_cost ? BCrypt::Engine::MIN_COST : BC , rest assure that this is not relevant to the topic of this blog. This involves the use of the BCrypt library which encrypts the password.

However, we are interested in the last line of the password method: self.password_digest = BCrypt::Password.create(unencrypted_password, cost: cost) . Like every Ruby method, the last line of the method is what is returned. In this case, the self , which would be an instance of the user, is assigning the attribute, password_digest , with the encrypted password, which is created by the BCrypt library.

So, in a nutshell, the has_secure_password contains the InstancesMethodsOnActivation , which “converts” the password attribute from the seed data to password_digest as reflected in the schema.

With this new discovery, let’s change the seed data attribute from password_digest to password .

//Revised Seed data
for x in 1..5
User.find_or_create_by(username: "Seed Data #{x}", password: "123#{x}")
end

Let’s run this seed data! And… this is what appears:

rails aborted!ActiveRecord::StatementInvalid: SQLite3::SQLException: no such column: users.password: SELECT  "users".* FROM "users" WHERE "users"."username" = ? AND "users"."password" = ? LIMIT ?

The error above appeared!

Order of Operations With find_or_create_by

According to the documentation for find_or_create_by :

Finds the first record with the given attributes, or creates a record with the attributes if one is not found:

Finding the first record with the given attributes also coincides with the error that we received. In the error, the SQL statement is selecting all rows where the user.username and user.password matches with our input, except there is no column with the name password because our schema contains password_digest and username .

Since the SQL statement fails at the find part, it will never reach the create part of the find_or_create_by method. However, as mentioned earlier in this blog, we needed to change the attribute of the seed data from password_digest to password because of the has_secure_password method.

To circumvent the error, I would need to change the find_or_create_by method to create . Originally, I wanted to use the find_or_create_by method to avoid duplicates. Given the encryption methods from the BCrypt library, there should never be a case where the encrypted passwords are exactly the same, hence there should never be duplicates with the exact same attributes.

The new revised code for the seed data now looks like the following:

//Revised Seed data (again)
for x in 1..5
User.create(username: "Seed Data #{x}", password: "123#{x}")
end

Let’s run this and see what happens!

Looks like the code completed running via rails db:seed but let’s double check. We can enter the rails console via rails c and typing in User.all .

// ♥ rails cRunning via Spring preloader in process 24545Loading development environment (Rails 5.2.4.1)2.6.1 :001 > User.allUser Load (4.5ms)  SELECT  "users".* FROM "users" LIMIT ?  [["LIMIT", 11]]=> #<ActiveRecord::Relation [#<User id: 1, username: "Seed Data 1", password_digest: "$2a$12$Gk1Hruq4rpHshJ.dJNxo/u0BKJce32ytHwoBIOFAUP1...", created_at: "2020-01-13 03:48:32", updated_at: "2020-01-13 03:48:32">, #<User id: 2, username: "Seed Data 2", password_digest: "$2a$12$jUBRmNiFp4NxK.nPWdB45ulg8LzB3Vk1Kg5d9PKg0bv...", created_at: "2020-01-13 03:48:32", updated_at: "2020-01-13 03:48:32">, #<User id: 3, username: "Seed Data 3", password_digest: "$2a$12$PjTfpqjndExTDWDBsmDpne4T0trwnPrKcxM16FAq4xG...", created_at: "2020-01-13 03:48:32", updated_at: "2020-01-13 03:48:32">, #<User id: 4, username: "Seed Data 4", password_digest: "$2a$12$Vr6IFEBoSEM4slCnlUM5kOZNIyeUXKojOoB04QH0maq...", created_at: "2020-01-13 03:48:33", updated_at: "2020-01-13 03:48:33">, #<User id: 5, username: "Seed Data 5", password_digest: "$2a$12$RiklTVCJDw4E9CV14E6eUeyXWFn/UbEciDa0HKwceQk...", created_at: "2020-01-13 03:48:33", updated_at: "2020-01-13 03:48:33">]>

Success! We were able to create seed data with encrypted passwords!

Key Takeaway

Understand the portions of the source code of third party libraries or existing methods you are using that are relevant to your code (i.e., know what’s happening “under the hood”). Although it’s impossible to understand every single line of code, understanding the relevant parts will immensely help you debug after you’ve exhausted all other options.

--

--

Samuel Guo
The Startup

Full stack software developer with experience in Javascript, React, Redux, and Ruby on Rails and a background in Mechanical Engineering.