Querying in ActiveRecord: Why SQL is still important
Everyone knows when learning something new it is always important to learn the basics, then build up to higher level thinking. Nowhere is this more important than when learning how to code. If someone learns the basic concepts of one programming language, then they are able to translate those to other languages and pick them up very easily.
So let’s talk about the relationship between SQL and ActiveRecord.
What is SQL?
SQL is a language used to create tables in databases that store data and can be searched through using queries. These table creations and queries can be written in ruby, as long as they are put into a string. The query strings can then be executed by the ruby program, and will search the database and return the what the query was searching for.
What is ActiveRecord?
AcitveRecord is a class in ruby that is used to map classes and the attributes that these classes possess, to tables in a database. This is done by inheriting from either the ActiveRecord::Base class or the ActiveRecord::Migration class.
ActiveRecord::Base is used when creating the classes that will be stored in tables. In this class it is possible to define relationships and any methods that are needed to operate on an instance or a on the whole class.
ActiveRecord::Migration is used for creating, updating, and deleting tables. This is done by migrating any schema information to the database.
So how are these related?
When first learning SQL it may seem extremely difficult to learn the syntax of how to select just what you want, or which tables to join to do your query. ActiveRecord makes your life easier by bundling common query methods into nice simple class methods. The same SQL queries are run by ActiveRecord, but by abstracting them to methods such as find(), group(), select() we as programmers do not see what is happening.
While this may seem like a blessing, it actually can make finding the correct string of ActiveRecord query methods very difficult. I myself, felt like it was so much easier because it required less typing, and these methods were named intuitively. For example if you want to find a book by title and author you could write,
Book.all.find_by_title_and_name("sample_title", "sample_name")But by making abstracting away the SQL operations, it is making it more difficult to troubleshoot the query. While this example is very simple, when operating with much bigger databases with many more relationships, it would be harder to get your query to work on the first try. So how do we see what ActiveRecord is running in the background?
ActiveRecord Logger
To help programmers see what their queries are actually doing a Logger class was made. By putting this code into your rake console task,
ActiveRecord::Base.logger = Logger.new(STDOUT)it is possible to see all the SQL being run in the background by ActiveRecord, in your terminal. It is important to always try to think what our SQL query statement would be, because this helps to reinforce SQL skills and thought processes. It makes you think through the query step by step.
Practice Querying with Logger
Lets take a look at an example. Suppose we had classes Owner, Cat, and Toy. An owner has many cats, a cat belongs to an owner, a cat has many toys, and a toy belongs to a cat. If we set up these classes to inherit from ActiveRecord::Base, and we create migrations that inherit from ActiveRecord::Migration we can create some tables like this,
CREATE TABLE "cats" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
"name" varchar,
"age" integer,
"owner_id" integer);CREATE TABLE "owners"
("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
"name" varchar);CREATE TABLE "toys"
("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
"name" varchar,
"fun" integer,
"cat_id" integer);
With this schema we can then put in seed data that looks like this,
fluffy = Cat.new(name: "Fluffy", age: 7)
mittens = Cat.new(name: "Mittens", age: 2)
kit = Cat.new(name: "Kit", age: 1)
smokey = Cat.new(name: "Smokey", age: 5)joe = Owner.new(name: "Joe")
bob = Owner.new(name: "Bob")mouse = Toy.new(name: "Mouse Toy", fun: 8)
string = Toy.new(name:"String Toy", fun:4)
yarn = Toy.new(name: "Yarn", fun: 6)
ball =Toy.new(name: "Ball", fun: 5)
catnip = Toy.new(name: "Catnip", fun: 10)joe.cats << fluffy
joe.cats << kit
joe.cats << smokey
bob.cats << mittensfluffy.toys << mouse
fluffy.toys << ball
mittens.toys << string
kit.toys << yarn
smokey.toys << smokey
Lets try a practice query. How would I access all the toys that an Joe’s cats possess? You might think it would be,
joe.cats.toysThis seems like it would give the correct answer because we are getting the cats that Joe owns, then getting their toys. But when this query is run we get an error that looks like this,
NoMethodError: undefined method `toys’ for #<Cat::ActiveRecord_Associations_CollectionProxy:0x007fabac9ee6c0>Well this doesn’t make sense, we do have a toys method for the cat instance, but what is this ActiveRecord_Associations_CollectionProxy? I’m not going to go into too much detail on this, but this association collection proxy is a middleman between the cat and the toys it owns (more on this at the bottom). If we break this query down into pieces, and use the logger we can find out what went wrong.
When we do joe.cats, we see that we actually get an array of all the cats that Joe owns. So, when we called toys on that array there was no method for a cat array.
joe.cats
D, [2017-07-20T07:38:11.129725 #23037] DEBUG -- : Cat Load (0.1ms) SELECT "cats".* FROM "cats" WHERE "cats"."owner_id" = ? [["owner_id", 1]]=> [
#<Cat:0x007fabac555550 id: 1, name: "Fluffy", age: 7, owner_id: 1>,
#<Cat:0x007fabac554df8 id: 3, name: "Kit", age: 1, owner_id: 1>,
#<Cat:0x007fabac554650 id: 4, name: "Smokey", age: 5, owner_id: 1>]With the SQL logger on, we can verify that the first part of our query is working as expected and troubleshoot the second part. Since we have an array, we should try to iterate through and grab the toys of each cat.
pry(main)> joe.cats.map {|cat| cat.toys}
D, [2017-07-20T07:42:15.165142 #23037] DEBUG -- : Toy Load (0.2ms) SELECT "toys".* FROM "toys" WHERE "toys"."cat_id" = ? [["cat_id", 1]]
D, [2017-07-20T07:42:15.171300 #23037] DEBUG -- : Toy Load (0.1ms) SELECT "toys".* FROM "toys" WHERE "toys"."cat_id" = ? [["cat_id", 3]]
D, [2017-07-20T07:42:15.173964 #23037] DEBUG -- : Toy Load (0.1ms) SELECT "toys".* FROM "toys" WHERE "toys"."cat_id" = ? [["cat_id", 4]]=> [
[#<Toy:0x007fabac51d538 id: 1, name: "Mouse Toy", fun: 8, cat_id: 1>, #<Toy:0x007fabac4fccc0 id: 4, name: "Ball", fun: 5, cat_id: 1>],
[#<Toy:0x007fabac41cc10 id: 3, name: "Yarn", fun: 6, cat_id: 3>],
[#<Toy:0x007fabac4a7720 id: 5, name: "Catnip", fun: 10, cat_id: 4>]]Great! We got the result we wanted, but can we refactor this to make it cleaner? We actually can, and it’s as simple as doing joe.toys. When we do this we see that SQL is doing an inner join on the cats table and toys table, once it has found all the cats belonging to Joe.
joe.toys
D, [2017-07-20T08:01:02.301461 #23037] DEBUG -- : Toy Load (0.1ms) SELECT "toys".* FROM "toys" INNER JOIN "cats" ON "toys"."cat_id" = "cats"."id" WHERE "cats"."owner_id" = ? [["owner_id", 1]]=>
[#<Toy:0x007fabb01022b0 id: 1, name: "Mouse Toy", fun: 8, cat_id: 1>,
#<Toy:0x007fabb0102170 id: 3, name: "Yarn", fun: 6, cat_id: 3>,
#<Toy:0x007fabb0102030 id: 4, name: "Ball", fun: 5, cat_id: 1>,
#<Toy:0x007fabb0101ef0 id: 5, name: "Catnip", fun: 10, cat_id: 4>]By doing this we see that the inner join not only accomplished our task faster and easier, but gave us a more preferable result of a one dimensional array instead of having nested arrays for each cat’s toys.
Lets try a harder, oddly specific query. Say Joe wants to get the toy with a fun level of 5 from his cat that has an id of 1. How would we do this? My first guess would be this,
joe.cats.find(1).toys.where("toy.fun = 5")
returns #<Toy::ActiveRecord_AssociationRelation:0x3fd5d66e94fc>What does this mean? With the logger we see what we actually searched for:
pry(main)> joe.cats.find(1).toys.where("toy.fun = 5")=D, [2017-07-19T22:05:52.393643 #23037] DEBUG -- : Cat Load (0.1ms) SELECT "cats".* FROM "cats" WHERE "cats"."owner_id" = ? AND "cats"."id" = ? LIMIT ? [["owner_id", 1], ["id", 1], ["LIMIT", 1]]D, [2017-07-19T22:05:52.396797 #23037] DEBUG -- : Toy Load (0.4ms) SELECT "toys".* FROM "toys" WHERE "toys"."cat_id" = ? AND (toy.fun = 5) [["cat_id", 1]]=> #<Toy::ActiveRecord_AssociationRelation:0x3fd5d66e94fc>
The first part of this query looks right. ActiveRecord is grabbing the cat with an id of 1 and an owner_id of 1. So the second part must be wrong. If we break it down, we see ActiveRecord is selecting all from toys where the toy’s cat_id is equal to 1 and the toy.fun is equal to 5. But with a closer look, toy.fun does not exist. When looking through the database we have to refer to the table name of toys, and not the class name of toy. What we really want to do is this,
pry(main)> joe.cats.find(1).toys.where("toys.fun = 5")D, [2017-07-19T22:05:05.019866 #23037] DEBUG -- : Cat Load (0.1ms) SELECT "cats".* FROM "cats" WHERE "cats"."owner_id" = ? AND "cats"."id" = ? LIMIT ? [["owner_id", 1], ["id", 1], ["LIMIT", 1]]D, [2017-07-19T22:05:05.023417 #23037] DEBUG -- : Toy Load (0.3ms) SELECT "toys".* FROM "toys" WHERE "toys"."cat_id" = ? AND (toys.fun = 5) [["cat_id", 1]]
=> [#<Toy:0x007fabacdaab88 id: 4, name: "Ball", fun: 5, cat_id: 1>]
By breaking this down into pieces and using the ActiveRecord Logger, we are able to see the errors made no matter how small they are.
So what’s the point?
Why should we care about this SQL operating in the background if ActiveRecord takes care of it for us? Well the main reason is that this abstraction of querying can be good, and bad. By abstracting away the queries, there is a kind of disconnect between the programmer and the query. While in most instances it is possible to find the source of the error in a query, when it is not possible it is always best to go back to basics. It’s always best to approach a query problem by thinking in SQL first, then finding the correct methods and correct order of methods that effectively translate your ideas. Every ruby programmer using ActiveRecord should keep this logger tool in their debugging toolbox.
Sources:
