Ops Scripting w. Ruby: Frequency 3

Tracking Frequency in Ruby: Part III

Before we showed how the basics of a frequency hash, building it as we iterate line by line through the file, and how to default a value when reference a key that does not exist.

This time we’ll show how to process this by first creating a list of shells used from the file, then building a hash by counting their frequency. This allows us to use some functional programming style with functions select() and map().

The Solutions

In previous solutions, we built the data structure, the counts dictionary as we went through each line. This time, we will do it more serially, to where we’ll create a list of shells called shell_list, and then build our counts dictionary from that.

Solution 4: Conditional Shovel

Filter First then Append

This is the same conditional loop we have done before:

File.open('etc/passwd', 'r') do |file|
while line = file.gets
# process each line

This time however we can to see if the line is valid using regular expression /:$/ to make sure it doesn’t match (not match operator !~) a line with colon at the end:

line !~ /:$/

Then we slice off the 7th item (index of 6) and append this to the list with the shovel operator <<:

shell_list << line.chomp.split(/:/)[6]

From all of this, we get a list of shells, which we can now process. We first get a unique list of shells with shell_list.uniq(), and then add the total count into the counts hash.

shell_list.uniq.each do |uniq_shell|
counts[uniq_shell] = shell_list.count(uniq_shell)

Solution 5: Getting Functional with Map and Select

In this solution, we’ll use a functional approach using map() and select. In ruby, using an iterable list, we can use a select() to filter out items, and a map() to transform the data. These functions accept a block as a parameter, which is then used to process the list.

The logic of our our pipeline approach is this:

list = lines | select valid_lines | map 7th_item_sliced

After getting a list of the lines, we run select( code_block ) on it:

lines_from_file.select { |line| line !~ /:$/ }

Note, we could also have done a positive match and reversed the logic in the regex looking for lines that do not end in a colon:

lines_from_file.select { |line| line =~ /^.*[^:]$/ }

Ruby will run each item through the select code block, and if it returns true, then pass this item to the list, which we run map on it:

list_of_valid_lines.map { |line| (line.chomp.split(/:/))[6] }

The map will run this code block on every item, which slices off the 7th item (indexed by 6) and gets passed to the final list shell_list.

With the shell list, we can create a hash using map, by returning an anonymous list and converting it to a hash using to_h. Here are some examples on how to do this:

squares = numbers.map { |num| [num, num**2] }.to_h
capitals = words.map { |word| [word, word.upcase()] }.to_h

We create our key and value pares using this method:

shell_list.uniq.map {|s| [s, shell_list.count(s)] }.to_h

About Iterable Blocks

For those unfamiliar with Ruby, you can use either do…end or braces {}:

iterable.each do |i|
list << i
iterable.each { |i| list << i }

Typically, if written on a single line, braces {} are preferred, but if used in multiple lines, do…end is preferred.

The Conclusion

Using functional programming with select and map along with blocks are powerful tools within ruby, and can be tricky to get your mind around if exposed to it the first time.

I hope I helped teach these to new comers, as well as show how they are applied to a common use case of building a frequency hash, a popular data structure in analyzing logs and other data.

These are the takeaways for this article:

  • Regular expressions to filter lines with =~ or !~
  • Shovel operator << for append
  • Creating list of unique elements with uniq
  • Blocks as parameters with do…end or braces {}
  • map() and select() Functions
  • Creating a hash with map().to_h