Using grep in Ruby

Most users of POSIX based systems have most likely heard of or use the grep command regularly. For those who don’t know, grep is a method of searching an input file for a match of one or more patterns. It’s a very useful and powerful tool, especially when you pair it with some advanced regular expressions.

One of the things I didn’t know until yesterday, and thought would be interesting to share, is that you can also use grep on any Ruby class that includes the Enumerable module.

The output of ri Enumerable#grep is as follows:

= Enumerable#grep
(from ruby core)
--------------------------------------------------------------------enum.grep(pattern) -> array
enum.grep(pattern) { |obj| block } -> array
--------------------------------------------------------------------
Returns an array of every element in enum for which Pattern ===
element. If the optional block is supplied, each matching element is
passed to it, and the block's result is stored in the output array.
(1..100).grep 38..44   #=> [38, 39, 40, 41, 42, 43, 44]
c = IO.constants
c.grep(/SEEK/) #=> [:SEEK_SET, :SEEK_CUR, :SEEK_END]
res = c.grep(/SEEK/) { |v| IO.const_get(v) }
res #=> [0, 1, 2]

This means that we can do some seriously powerful and specific searches for elements within any Enumaerble using regular expressions.

One of the things that isn’t mentioned explicity in the ri documentation is that you can also grep on module and class types, too:

irb(main):001:0> array_of_diff_types = [{a: 1}, [1, 2, 3], 25, 'bob']
=> [{:a=>1}, [1, 2, 3], 25, "bob"]
irb(main):002:0> array_of_diff_types.grep(Hash)
=> [{:a=>1}]
irb(main):003:0> array_of_diff_types.grep(Array)
=> [[1, 2, 3]]
irb(main):006:0> array_of_diff_types.grep(Integer)
=> [25]
irb(main):008:0> array_of_diff_types.grep(String)
=> ["bob"]
irb(main):009:0> array_of_diff_types.grep(Symbol)
=> []

Whenever you grep for a specific class, it returns an array of the elements within the Enumerable that matches said class, or an empty array if there are no matches.

It also works across the class hierarchy, e.g.

irb(main):011:0> array_of_diff_types.grep(Object)
=> [{:a=>1}, [1, 2, 3], 25, "bob"]

Grepping against modules is also supported, and works how you would imagine — matching any classes that include the module given:

irb(main):012:0> array_of_diff_types.grep(Enumerable)
=> [{:a=>1}, [1, 2, 3]]

If you would like to be more specific and find a certain element in an Enumerable, grep allows you to do it:

irb(main):017:0> array_of_diff_types.grep({a: 1})
=> [{:a=>1}]

Interestingly, grep is only a smidge faster than select in the benchmark I ran below ran using ruby 2.4.1p111.

require 'rubygems'
require 'benchmark'
test_array = %w(hamburger cheeseburger goatburger veggieburger lettuce tomato ketchup superburger) 
Benchmark.bm do |x|  
x.report { 1000000.times { test_array.grep(/burger/) } }
x.report { 1000000.times { test_array.select { |item| /burger/ === item } } }
end
    user     system      total        real
3.080000 0.010000 3.090000 ( 3.122951)
3.520000 0.030000 3.550000 ( 3.612067)

Keep in mind, in the real world, the speed difference most likely won’t matter, just use the one that is easier to read for you and your team.

Grepping, whether in a POSIX system’s command line, or in Ruby, is a great and powerful tool. This power is supercharged with the use of regular expressions and, in Ruby, grepping against an enumerable’s children’s class, or module types can prove to be a great way to make potentially complex searches easier to read.

Thanks to Pericles Theodorou for the enlightenment.

Edited on 30th of June to update the benchmark to use a more accurate comparison, thanks to @sgrif.


Interested in making an Impact? Join the carwow team!
Feeling social? Connect with us on Twitter and LinkedIn :-)