Understanding ruby load, require, gems, bundler and rails autoloading from the bottom up
If I have a file foo.rb
that contains:
puts("foo.rb loaded!")$FOO = 2
Then I can fire up a ruby console with irb
and load it:
> load('/Users/cstack/foo.rb')
foo.rb loaded!
=> true> $FOO
=> 2
What does `load` do?
load
is defined in the Kernel
module (documentation). Pass it the absolute path to a ruby file and it will execute the code in that file. load
always returns true
(if the file could not be loaded it raises a `LoadError`). Global variables, classes, constants and methods are all imported, but not local variables:
# foo.rb$FOO_GLOBAL_VARIABLE = 2class FooClass; endFOO_CONSTANT = 3def foo_method; endfoo_local_variable = 4
Then in irb
> load('/Users/cstack/foo.rb')
=> true> $FOO_GLOBAL_VARIABLE
=> 2> FooClass
=> FooClass> FOO_CONSTANT
=> 3> foo_method
=> nil> foo_local_variable
NameError: undefined local variable or method `foo_local_variable'
Calling `load` multiple times
Calling load
twice on the same file will execute the code in that file twice. Since foo.rb
defines a constant, it will define that constant twice, which produces a warning. Assume foo.rb
contains:
# foo.rbputs("foo.rb loaded!")
FOO_CONSTANT = 3
Then in irb
> load('/Users/cstack/foo.rb')
foo.rb loaded!
=> true> load('/Users/cstack/foo.rb')
foo.rb loaded!
/Users/cstack/foo.rb:2: warning: already initialized constant FOO_CONSTANT
=> true
Calling `load` with relative paths
You can also pass a relative path to load
. Assuming you are in the same directory as foo.rb
, you can do this:
> load('./foo.rb')
foo.rb loaded!
=> true
If you are in a different directory, load
will not find the file:
> load('./foo.rb')
LoadError: cannot load such file -- foo.rb
And if you change the ruby process’s working directory, it won’t find the file either.
> load('./foo.rb')
foo.rb loaded!
=> true> Dir.chdir('..')
=> 0> load('./foo.rb')
LoadError: cannot load such file -- foo.rb
$LOAD_PATH
$LOAD_PATH
is an array of absolute paths to directories. If you pass load
just a file name, it will loop through $LOAD_PATH
and search for the file in each directory.
> $LOAD_PATH.push("/Users/cstack")> load('foo.rb')
foo.rb loaded!
=> true
The name $LOAD_PATH
is a reference to the Unix environment variable $PATH
, which also stores a list of directories. Just as Unix will loop through $PATH
to find the executable for a given command, Ruby will loop through $LOAD_PATH
to find a ruby file with the given name.
In addition to all the directories listed in $LOAD_PATH
, load
will implicitly look in the current directory:
> Dir.chdir("/Users/cstack")
=> 0> load('foo.rb')
foo.rb loaded!
=> true
What does `require` do?
require
is similar to load
, with a few differences:
Calling require on the same file twice will only execute it once. require
returns true
if the file was executed and false
if it wasn’t.
> $LOAD_PATH.push('/Users/cstack')
=> ["/Users/cstack"]> require('foo.rb')
foo.rb loaded!
=> true> require('foo.rb')
=> false
require
keeps track of which files have been loaded already in the global variable $LOADED_FEATURES
. It’s also smart enough not to load the same file twice if you refer to it once with a relative path and once with an absolute path.
You don’t need to include the file extension:
> $LOAD_PATH.push('/Users/cstack')
=> ["/Users/cstack"]> require('foo')
foo.rb loaded!
=> true
require
will look for foo.rb
, but also dynamic library files, like foo.so
, foo.o
, or foo.dll
. This is how you can call C code from ruby.
It’s also worth noting that require
does not check the current directory, since the current directory is by default not in $LOAD_PATH
:
> File.exists?('foo.rb')
=> true> require('foo')
LoadError: cannot load such file -- foo
What does `require_relative` do?
This works like require
, but it takes a path relative to the current file, not the working directory of the process.
Let’s say I have two files, foo.rb
and bar.rb
in /Users/cstack
# foo.rb
puts("foo.rb loaded!")
load('bar.rb')
and
# bar.rb
puts("bar.rb loaded!")
In a different directory, say /
I start up irb
and load foo.rb
> load('/Users/cstack/foo.rb')
foo.rb loaded!
LoadError: cannot load such file -- bar.rb
foo.rb
is loaded just fine because I gave it an absolute path. But foo
can’t call load('bar.rb')
like this because bar.rb
is in /Users/cstack
but the working directory is actually /
. If we use require_relative
, it will look for bar.rb
in the same directory as foo.rb
# foo.rb
puts("foo.rb loaded!")
require_relative('bar.rb')
Then
> load('/Users/cstack/foo.rb')
foo.rb loaded!
bar.rb loaded!
=> true
What are gems?
A gem is a ruby package used by the RubyGems package manager. More concretely, it’s a zip file containing a bunch of ruby files and/or dynamic library files that can be imported by your code, along with some metadata.
For example, json
is a gem that contains code for parsing and generating JSON. Here is the rubygems page for the json gem. To see where the gem is stored my computer, I run:
~ gem which json
/Users/cstack/.rvm/rubies/ruby-2.3.1/lib/ruby/2.3.0/json.rb
How do you require gems?
If you know the absolute path of the gem, you can load
or require
it, just like we did above:
> load('/Users/cstack/.rvm/rubies/ruby-2.3.1/lib/ruby/2.3.0/json.rb')
=> true> JSON
=> JSON
But RubyGems has some code that makes it easier to require gems. If you look at $LOADED_FEATURES
immediately after starting irb
you’ll see that some RubyGems code has already been loaded:
> puts $LOADED_FEATURES
...
/Users/cstack/.rvm/rubies/ruby-2.3.1/lib/ruby/2.3.0/rubygems.rb
...
This Rubygems code actually replaces the default require
method with its own version. That version will look through your installed gems in addition to the directories in $LOAD_PATH
. If Rubygems finds the file in your gems, it will add that gem to your $LOAD_PATH
:
> puts $LOAD_PATH.grep(/json/) # json is initially not in $LOAD_PATH
=> nil> JSON # JSON is initially not loaded
NameError: uninitialized constant JSON
> require('json') # RubyGems searches through your installed gems
=> true> puts $LOAD_PATH.grep(/json/) # RubyGems adds entries to $LOAD_PATH
/Users/cstack/.rvm/gems/ruby-2.3.1/gems/json-2.1.0/lib
/Users/cstack/.rvm/gems/ruby-2.3.1/extensions/x86_64-darwin-16/2.3.0/json-2.1.0
=> nil> JSON # Now all the code from the json gem is loaded
=> JSON
How does all the code in a gem get loaded?
- You call
require('json')
in your code, asking to find, read and execute a file calledjson.rb
(or.so
or.dll
) require
needs to figure out where the file is. It first looks in all the directories in$LOAD_PATH
- If it can’t be found, RubyGems will look for an installed gem which has a file called
json.rb
. - RubyGems adds that gem’s directories to
$LOAD_PATH
. The directories it adds are defined that gem’s gemspecrequire_paths
option. By convention, most gems add only theirlib
directory. require
tries looking for the file again, but this time it’s able to find a file calledjson.rb
in the gem’slib
directory.json.rb
in the json gem defines a module calledJSON
and callsrequire
on the other files in the gem, e.g.require 'json/common'
. Now that the json gem’slib
directory is in$LOAD_PATH
, it’s able to findjson/common.rb
inside that directory- Control returns to your program. All the files in the json gem have been loaded, and the
JSON
module is defined!
How do you install a gem?
If you don’t have the json gem installed, you can install it like so:
~ gem install json
Fetching: json-2.1.0.gem (140800B)
Building native extensions. This could take a while...
Successfully installed json-2.1.0
1 gem installed
This command queries the gem server to see if there is a gem named json. It finds it on rubygems.org, then downloads the gem.
It then compiles any C code into dynamic library files, and gives a success message.
To see where your gems are saved, look at the output of gem environment
:
...
- INSTALLATION DIRECTORY: /Users/cstack/.rvm/gems/ruby-2.3.1
...
The output will also show where gems are downloaded from:
...
- REMOTE SOURCES:
- https://rubygems.org/
...
The default gem server is rubygems.org. The gem server stores copies of many gems. When you run gem install json
, it downloads the json gem from your configured gem server and saves it to your configured gems directory. It also builds any native extensions (compiled C code).
What if one gem requires another gem?
domain_name is a gem that parses domain names. Somewhere in that gem is the line require 'unf'
(a library for dealing with unicode strings). This would raise an exception unless the unf gem was also installed. And in fact, unf in turn requires unf-ext. Luckily, gem install domain_name
figures out all of these dependencies and installs them. In fact, it installs all of a gem’s dependencies before it installs the gem itself:
~ gem install domain_name
Fetching: unf_ext-0.0.7.4.gem (100%)
Building native extensions. This could take a while...
Successfully installed unf_ext-0.0.7.4
Fetching: unf-0.1.4.gem (100%)
Successfully installed unf-0.1.4
Fetching: domain_name-0.5.20170404.gem (100%)
Successfully installed domain_name-0.5.20170404
Parsing documentation for unf_ext-0.0.7.4
Installing ri documentation for unf_ext-0.0.7.4
Parsing documentation for unf-0.1.4
Installing ri documentation for unf-0.1.4
Parsing documentation for domain_name-0.5.20170404
Installing ri documentation for domain_name-0.5.20170404
Done installing documentation for unf_ext, unf, domain_name after 6 seconds
3 gems installed
Gem Versions
Every gem you install has a version. I see that my version of domain_name is 0.5.20170404 (presumably released on 2017–04–04):
~ gem list domain_name*** LOCAL GEMS ***domain_name (0.5.20170404)
I can install a second version with the -v
command
~ gem install domain_name -v 0.5.20160826
Fetching: domain_name-0.5.20160826.gem (100%)
Successfully installed domain_name-0.5.20160826
Parsing documentation for domain_name-0.5.20160826
Installing ri documentation for domain_name-0.5.20160826
Done installing documentation for domain_name after 3 seconds
1 gem installed
Now I see that I have two versions installed:
~ gem list domain_name*** LOCAL GEMS ***domain_name (0.5.20170404, 0.5.20160826)
I can see which version is being used with:
~ gem which domain_name
/usr/local/rvm/gems/ruby-2.2.1/gems/domain_name-0.5.20170404/lib/domain_name.rb
This listed a file in the directory domain_name-0.5.20170404, so the version is 0.5.20170404. So RubyGems will give me the more recent version if I require domain_name.
How do you require an older version of a gem?
The gem
method (defined by RubyGems) lets you specify the version you want for a gem. It adds to $LOAD_PATH
the directory of that specific gem version and its dependencies:
> gem('domain_name', '0.5.20160826')
=> true> puts $LOAD_PATH.first(4) # domain_name and dependencies
/usr/local/rvm/gems/ruby-2.2.1/gems/unf_ext-0.0.7.4/lib
/usr/local/rvm/gems/ruby-2.2.1/extensions/x86-linux/2.2.0/unf_ext-0.0.7.4
/usr/local/rvm/gems/ruby-2.2.1/gems/unf-0.1.4/lib
/usr/local/rvm/gems/ruby-2.2.1/gems/domain_name-0.5.20160826/lib
=> nil> require('domain_name')
=> true
You can see in the output of $LOAD_PATH
that the older version (domain_name-0.5.20160826) is loaded.
What does “activating a gem spec” mean?
Calling the gem
method above “activates” the “spec” for a gem. “Spec” is short for “specification” and refers to a particular version of a gem. “Activating” means adding its directories to $LOAD_PATH
and recording that it was activated. After activating a spec, it can be loaded with load
or require
. Rubygems records a list of specs that have been activated so it can raise an error if you try to use two versions of the same gem.
What is Bundler?
Bundler lets you specify all the gems your project needs, and optionally what versions of those gems. Then the bundle
command installs all those gems and their dependencies.
You specify which gems you need in a file called Gemfile
. Here’s a simple Gemfile:
# Gemfile
gem 'domain_name'
If you run bundle
, it will generate a file called Gemfile.lock
:
# Gemfile.lock
GEM
specs:
domain_name (0.5.20170404)
unf (>= 0.0.5, < 1.0.0)
unf (0.1.4)
unf_ext
unf_ext (0.0.7.4)PLATFORMS
rubyDEPENDENCIES
domain_nameBUNDLED WITH
1.10.6
The specs:
section of the file lists each gem that should be installed, the version, which gems it depends on, and what versions of those gems it will accept. If Bundler did it’s job correctly, the chosen version for each gem should satisfy the version requirements imposed by all other gems in the file.
In addition to generating Gemfile.lock
, bundle
also installs those gems at those specific versions.
What does `bundle exec` do?
Putting bundle exec
before a command, e.g. bundle exec rspec
, ensures that require
will load the version of a gem specified in your Gemfile.lock
as opposed to the most recent version.
Going back to the previous example, I have two versions of domain_name installed:
~ gem list domain_name*** LOCAL GEMS ***domain_name (0.5.20170404, 0.5.20160826)
And my Gemfile specifies the older version:
# Gemfile
gem 'domain_name', '0.5.20160826'
My Gemfile.lock should also show the older version:
# Gemfile.lock
GEM
specs:
domain_name (0.5.20160826)
unf (>= 0.0.5, < 1.0.0)
unf (0.1.4)
unf_ext
unf_ext (0.0.7.4)PLATFORMS
rubyDEPENDENCIES
domain_name (= 0.5.20160826)BUNDLED WITH
1.10.6
I modified both gems to print out their version when they are loaded, then wrote this ruby script:
# foo.rb
require('domain_name')
If I just do ruby foo.rb
, it loads the newer version:
~ ruby foo.rb
loaded '0.5.20170404' !
If I use bundle exec
, it loads the version in my Gemfile.lock:
~ bundle exec ruby foo.rb
loaded '0.5.20160826' !
How does Rails load all my gems?
There’s a detailed guide here covering how rails boots but the important part is the file config/boot.rb
which contains
ENV['BUNDLE_GEMFILE'] ||= File.expand_path('../Gemfile', __dir__)require 'bundler/setup' # Set up gems listed in the Gemfile
This assumes the bundler gem is installed, so RubyGems can intercept the call to require 'bundler/setup'
and load setup.rb
from the bundler gem. setup.rb
is responsible for reading Gemfile.lock and calling the gem
method for each gem with the correct version (thus “activating” that version of the gem). It will raise an exception if the gem version is not installed.
Later, in application.rb
, we call Bundler.require:
# Require the gems listed in Gemfile, including any gems
# you've limited to :test, :development, or :production.
Bundler.require(:default, Rails.env)
This calls Kernel.require
for each gem in Gemfile.lock
. Since we already set up $LOAD_PATH
to point to the correct version for each gem, this requires the version needed by your application.
What does the `:require => false` option
do in a Gemfile?
By default, calling Bundler.require
will require every gem from your Gemfile. If the line in the Gemfile says gem 'foo', :require => false
then foo
will still be installed by bundle
, but Bundler.require
won’t call Kernel.require
for foo
. You’ll have to call require('foo')
in your application if you want to use the gem.
Why don’t I have to `require` most constants in Rails?
This is a perfectly normal file in a rails app:
# app/controllers/posts_controller.rbclass PostsController < ApplicationController
def index
@posts = Post.all
end
end
You don’t need to do require('application_controller')
or require('post')
at the top because of Rails autoloading (full article here).
Rails changes the way constants (like ApplicationController) are looked up, so that it looks through a list of directories for a file matching the name of the constant. For ApplicationController, it looks for a file named application_controller.rb
. It checks to see if that file defines a constant called ApplicationController
and if so uses that. If not, it raises an exception.
The list of directories that are searched when autoloading is controlled by the rails config variable autoload_paths
.
Further Reading
These articles were a great help when writing this guide: