Understanding ruby load, require, gems, bundler and rails autoloading from the bottom up

cstack
10 min readMay 30, 2017

--

If I have a file foo.rb that contains:

puts("foo.rb loaded!")$FOO = 2

Then I can fire up a ruby console with irb and load it:

> load('/Users/cstack/foo.rb')
foo.rb loaded!
=> true
> $FOO
=> 2

What does `load` do?

load is defined in the Kernel module (documentation). Pass it the absolute path to a ruby file and it will execute the code in that file. load always returns true (if the file could not be loaded it raises a `LoadError`). Global variables, classes, constants and methods are all imported, but not local variables:

# foo.rb$FOO_GLOBAL_VARIABLE = 2class FooClass; endFOO_CONSTANT = 3def foo_method; endfoo_local_variable = 4

Then in irb

> load('/Users/cstack/foo.rb')
=> true
> $FOO_GLOBAL_VARIABLE
=> 2
> FooClass
=> FooClass
> FOO_CONSTANT
=> 3
> foo_method
=> nil
> foo_local_variable
NameError: undefined local variable or method `foo_local_variable'

Calling `load` multiple times

Calling load twice on the same file will execute the code in that file twice. Since foo.rb defines a constant, it will define that constant twice, which produces a warning. Assume foo.rb contains:

# foo.rbputs("foo.rb loaded!")
FOO_CONSTANT = 3

Then in irb

> load('/Users/cstack/foo.rb')
foo.rb loaded!
=> true
> load('/Users/cstack/foo.rb')
foo.rb loaded!
/Users/cstack/foo.rb:2: warning: already initialized constant FOO_CONSTANT
=> true

Calling `load` with relative paths

You can also pass a relative path to load. Assuming you are in the same directory as foo.rb, you can do this:

> load('./foo.rb')
foo.rb loaded!
=> true

If you are in a different directory, load will not find the file:

> load('./foo.rb')
LoadError: cannot load such file -- foo.rb

And if you change the ruby process’s working directory, it won’t find the file either.

> load('./foo.rb')
foo.rb loaded!
=> true
> Dir.chdir('..')
=> 0
> load('./foo.rb')
LoadError: cannot load such file -- foo.rb

$LOAD_PATH

$LOAD_PATH is an array of absolute paths to directories. If you pass load just a file name, it will loop through $LOAD_PATH and search for the file in each directory.

> $LOAD_PATH.push("/Users/cstack")> load('foo.rb')
foo.rb loaded!
=> true

The name $LOAD_PATH is a reference to the Unix environment variable $PATH, which also stores a list of directories. Just as Unix will loop through $PATH to find the executable for a given command, Ruby will loop through $LOAD_PATH to find a ruby file with the given name.

In addition to all the directories listed in $LOAD_PATH, load will implicitly look in the current directory:

> Dir.chdir("/Users/cstack")
=> 0
> load('foo.rb')
foo.rb loaded!
=> true

What does `require` do?

require is similar to load, with a few differences:

Calling require on the same file twice will only execute it once. require returns true if the file was executed and false if it wasn’t.

> $LOAD_PATH.push('/Users/cstack')
=> ["/Users/cstack"]
> require('foo.rb')
foo.rb loaded!
=> true
> require('foo.rb')
=> false

require keeps track of which files have been loaded already in the global variable $LOADED_FEATURES. It’s also smart enough not to load the same file twice if you refer to it once with a relative path and once with an absolute path.

You don’t need to include the file extension:

> $LOAD_PATH.push('/Users/cstack')
=> ["/Users/cstack"]
> require('foo')
foo.rb loaded!
=> true

require will look for foo.rb, but also dynamic library files, like foo.so, foo.o, or foo.dll. This is how you can call C code from ruby.

It’s also worth noting that require does not check the current directory, since the current directory is by default not in $LOAD_PATH:

> File.exists?('foo.rb')
=> true
> require('foo')
LoadError: cannot load such file -- foo

What does `require_relative` do?

This works like require, but it takes a path relative to the current file, not the working directory of the process.

Let’s say I have two files, foo.rb and bar.rb in /Users/cstack

# foo.rb
puts("foo.rb loaded!")
load('bar.rb')

and

# bar.rb
puts("bar.rb loaded!")

In a different directory, say / I start up irb and load foo.rb

> load('/Users/cstack/foo.rb')
foo.rb loaded!
LoadError: cannot load such file -- bar.rb

foo.rb is loaded just fine because I gave it an absolute path. But foo can’t call load('bar.rb') like this because bar.rb is in /Users/cstack but the working directory is actually /. If we use require_relative, it will look for bar.rb in the same directory as foo.rb

# foo.rb
puts("foo.rb loaded!")
require_relative('bar.rb')

Then

> load('/Users/cstack/foo.rb')
foo.rb loaded!
bar.rb loaded!
=> true

What are gems?

A gem is a ruby package used by the RubyGems package manager. More concretely, it’s a zip file containing a bunch of ruby files and/or dynamic library files that can be imported by your code, along with some metadata.

For example, json is a gem that contains code for parsing and generating JSON. Here is the rubygems page for the json gem. To see where the gem is stored my computer, I run:

~ gem which json
/Users/cstack/.rvm/rubies/ruby-2.3.1/lib/ruby/2.3.0/json.rb

How do you require gems?

If you know the absolute path of the gem, you can load or require it, just like we did above:

> load('/Users/cstack/.rvm/rubies/ruby-2.3.1/lib/ruby/2.3.0/json.rb')
=> true
> JSON
=> JSON

But RubyGems has some code that makes it easier to require gems. If you look at $LOADED_FEATURES immediately after starting irb you’ll see that some RubyGems code has already been loaded:

> puts $LOADED_FEATURES
...
/Users/cstack/.rvm/rubies/ruby-2.3.1/lib/ruby/2.3.0/rubygems.rb
...

This Rubygems code actually replaces the default require method with its own version. That version will look through your installed gems in addition to the directories in $LOAD_PATH. If Rubygems finds the file in your gems, it will add that gem to your $LOAD_PATH:

> puts $LOAD_PATH.grep(/json/) # json is initially not in $LOAD_PATH
=> nil
> JSON # JSON is initially not loaded
NameError: uninitialized constant JSON

> require('json') # RubyGems searches through your installed gems
=> true
> puts $LOAD_PATH.grep(/json/) # RubyGems adds entries to $LOAD_PATH
/Users/cstack/.rvm/gems/ruby-2.3.1/gems/json-2.1.0/lib
/Users/cstack/.rvm/gems/ruby-2.3.1/extensions/x86_64-darwin-16/2.3.0/json-2.1.0
=> nil
> JSON # Now all the code from the json gem is loaded
=> JSON

How does all the code in a gem get loaded?

  • You call require('json') in your code, asking to find, read and execute a file called json.rb (or .so or .dll)
  • require needs to figure out where the file is. It first looks in all the directories in $LOAD_PATH
  • If it can’t be found, RubyGems will look for an installed gem which has a file called json.rb.
  • RubyGems adds that gem’s directories to $LOAD_PATH. The directories it adds are defined that gem’s gemspec require_paths option. By convention, most gems add only their lib directory.
  • require tries looking for the file again, but this time it’s able to find a file called json.rb in the gem’s lib directory.
  • json.rb in the json gem defines a module called JSON and calls require on the other files in the gem, e.g. require 'json/common'. Now that the json gem’s lib directory is in $LOAD_PATH, it’s able to find json/common.rb inside that directory
  • Control returns to your program. All the files in the json gem have been loaded, and the JSON module is defined!

How do you install a gem?

If you don’t have the json gem installed, you can install it like so:

~ gem install json
Fetching: json-2.1.0.gem (140800B)
Building native extensions. This could take a while...
Successfully installed json-2.1.0
1 gem installed

This command queries the gem server to see if there is a gem named json. It finds it on rubygems.org, then downloads the gem.

It then compiles any C code into dynamic library files, and gives a success message.

To see where your gems are saved, look at the output of gem environment:

...
- INSTALLATION DIRECTORY: /Users/cstack/.rvm/gems/ruby-2.3.1
...

The output will also show where gems are downloaded from:

...
- REMOTE SOURCES:
- https://rubygems.org/
...

The default gem server is rubygems.org. The gem server stores copies of many gems. When you run gem install json, it downloads the json gem from your configured gem server and saves it to your configured gems directory. It also builds any native extensions (compiled C code).

What if one gem requires another gem?

domain_name is a gem that parses domain names. Somewhere in that gem is the line require 'unf' (a library for dealing with unicode strings). This would raise an exception unless the unf gem was also installed. And in fact, unf in turn requires unf-ext. Luckily, gem install domain_name figures out all of these dependencies and installs them. In fact, it installs all of a gem’s dependencies before it installs the gem itself:

~ gem install domain_name
Fetching: unf_ext-0.0.7.4.gem (100%)
Building native extensions. This could take a while...
Successfully installed unf_ext-0.0.7.4
Fetching: unf-0.1.4.gem (100%)
Successfully installed unf-0.1.4
Fetching: domain_name-0.5.20170404.gem (100%)
Successfully installed domain_name-0.5.20170404
Parsing documentation for unf_ext-0.0.7.4
Installing ri documentation for unf_ext-0.0.7.4
Parsing documentation for unf-0.1.4
Installing ri documentation for unf-0.1.4
Parsing documentation for domain_name-0.5.20170404
Installing ri documentation for domain_name-0.5.20170404
Done installing documentation for unf_ext, unf, domain_name after 6 seconds
3 gems installed

Gem Versions

Every gem you install has a version. I see that my version of domain_name is 0.5.20170404 (presumably released on 2017–04–04):

~ gem list domain_name*** LOCAL GEMS ***domain_name (0.5.20170404)

I can install a second version with the -v command

~ gem install domain_name -v 0.5.20160826
Fetching: domain_name-0.5.20160826.gem (100%)
Successfully installed domain_name-0.5.20160826
Parsing documentation for domain_name-0.5.20160826
Installing ri documentation for domain_name-0.5.20160826
Done installing documentation for domain_name after 3 seconds
1 gem installed

Now I see that I have two versions installed:

~ gem list domain_name*** LOCAL GEMS ***domain_name (0.5.20170404, 0.5.20160826)

I can see which version is being used with:

~ gem which domain_name
/usr/local/rvm/gems/ruby-2.2.1/gems/domain_name-0.5.20170404/lib/domain_name.rb

This listed a file in the directory domain_name-0.5.20170404, so the version is 0.5.20170404. So RubyGems will give me the more recent version if I require domain_name.

How do you require an older version of a gem?

The gem method (defined by RubyGems) lets you specify the version you want for a gem. It adds to $LOAD_PATH the directory of that specific gem version and its dependencies:

> gem('domain_name', '0.5.20160826')
=> true
> puts $LOAD_PATH.first(4) # domain_name and dependencies
/usr/local/rvm/gems/ruby-2.2.1/gems/unf_ext-0.0.7.4/lib
/usr/local/rvm/gems/ruby-2.2.1/extensions/x86-linux/2.2.0/unf_ext-0.0.7.4
/usr/local/rvm/gems/ruby-2.2.1/gems/unf-0.1.4/lib
/usr/local/rvm/gems/ruby-2.2.1/gems/domain_name-0.5.20160826/lib
=> nil
> require('domain_name')
=> true

You can see in the output of $LOAD_PATH that the older version (domain_name-0.5.20160826) is loaded.

What does “activating a gem spec” mean?

Calling the gem method above “activates” the “spec” for a gem. “Spec” is short for “specification” and refers to a particular version of a gem. “Activating” means adding its directories to $LOAD_PATH and recording that it was activated. After activating a spec, it can be loaded with load or require. Rubygems records a list of specs that have been activated so it can raise an error if you try to use two versions of the same gem.

What is Bundler?

Bundler lets you specify all the gems your project needs, and optionally what versions of those gems. Then the bundle command installs all those gems and their dependencies.

You specify which gems you need in a file called Gemfile. Here’s a simple Gemfile:

# Gemfile
gem 'domain_name'

If you run bundle, it will generate a file called Gemfile.lock:

# Gemfile.lock
GEM
specs:
domain_name (0.5.20170404)
unf (>= 0.0.5, < 1.0.0)
unf (0.1.4)
unf_ext
unf_ext (0.0.7.4)
PLATFORMS
ruby
DEPENDENCIES
domain_name
BUNDLED WITH
1.10.6

The specs: section of the file lists each gem that should be installed, the version, which gems it depends on, and what versions of those gems it will accept. If Bundler did it’s job correctly, the chosen version for each gem should satisfy the version requirements imposed by all other gems in the file.

In addition to generating Gemfile.lock, bundle also installs those gems at those specific versions.

What does `bundle exec` do?

Putting bundle exec before a command, e.g. bundle exec rspec, ensures that require will load the version of a gem specified in your Gemfile.lock as opposed to the most recent version.

Going back to the previous example, I have two versions of domain_name installed:

~ gem list domain_name*** LOCAL GEMS ***domain_name (0.5.20170404, 0.5.20160826)

And my Gemfile specifies the older version:

# Gemfile
gem 'domain_name', '0.5.20160826'

My Gemfile.lock should also show the older version:

# Gemfile.lock
GEM
specs:
domain_name (0.5.20160826)
unf (>= 0.0.5, < 1.0.0)
unf (0.1.4)
unf_ext
unf_ext (0.0.7.4)
PLATFORMS
ruby
DEPENDENCIES
domain_name (= 0.5.20160826)
BUNDLED WITH
1.10.6

I modified both gems to print out their version when they are loaded, then wrote this ruby script:

# foo.rb
require('domain_name')

If I just do ruby foo.rb, it loads the newer version:

~ ruby foo.rb
loaded '0.5.20170404' !

If I use bundle exec, it loads the version in my Gemfile.lock:

~ bundle exec ruby foo.rb
loaded '0.5.20160826' !

How does Rails load all my gems?

There’s a detailed guide here covering how rails boots but the important part is the file config/boot.rb which contains

ENV['BUNDLE_GEMFILE'] ||= File.expand_path('../Gemfile', __dir__)require 'bundler/setup' # Set up gems listed in the Gemfile

This assumes the bundler gem is installed, so RubyGems can intercept the call to require 'bundler/setup' and load setup.rb from the bundler gem. setup.rb is responsible for reading Gemfile.lock and calling the gem method for each gem with the correct version (thus “activating” that version of the gem). It will raise an exception if the gem version is not installed.

Later, in application.rb, we call Bundler.require:

# Require the gems listed in Gemfile, including any gems
# you've limited to :test, :development, or :production.
Bundler.require(:default, Rails.env)

This calls Kernel.require for each gem in Gemfile.lock. Since we already set up $LOAD_PATH to point to the correct version for each gem, this requires the version needed by your application.

What does the `:require => false` option do in a Gemfile?

By default, calling Bundler.require will require every gem from your Gemfile. If the line in the Gemfile says gem 'foo', :require => false then foo will still be installed by bundle, but Bundler.require won’t call Kernel.require for foo. You’ll have to call require('foo') in your application if you want to use the gem.

Why don’t I have to `require` most constants in Rails?

This is a perfectly normal file in a rails app:

# app/controllers/posts_controller.rbclass PostsController < ApplicationController
def index
@posts = Post.all
end
end

You don’t need to do require('application_controller') or require('post') at the top because of Rails autoloading (full article here).

Rails changes the way constants (like ApplicationController) are looked up, so that it looks through a list of directories for a file matching the name of the constant. For ApplicationController, it looks for a file named application_controller.rb. It checks to see if that file defines a constant called ApplicationController and if so uses that. If not, it raises an exception.

The list of directories that are searched when autoloading is controlled by the rails config variable autoload_paths.

Further Reading

These articles were a great help when writing this guide:

--

--

cstack

Writing codez @Square. Previously @Twitter. Graduated from University of Michigan. My heart is as big as a car.