Ops Scripting with Perl: Frequency

Tracking Frequency in Perl

Automation with scripting languages is required core skills for operations oriented roles. Ages ago, when Unix (later Linux) scripting involved shell programming and tools like awk, grep, sed, cut, find, etc. and later Perl became the popular go-to language for systems administrators.

Perl is still popular today for systems automation and operating systems configuration. If cannot be removed from a Linux distro, for example, as many underlying automation depends on Perl to function. For these examples, Perl still outperforms other languages: 1100% faster than Python, 800% than faster than ruby.

The Problem

The problem is to print a formatted summary of the shell and number of users that use that shell. We’ll have a local /etc/passwd file to as the data input. In Perl, we’ll create a frequency hash %counts to store our counts.

The Data

Here’s the local passwd file used for this exercise:

The Output

The output given the report given the data from above would have these counts:

Shell Summary Report:
==================================================
Shell # of Users
----------------- ------------
/bin/bash 3 users
/bin/false 7 users
/bin/sync 1 users
/usr/sbin/nologin 17 users

The Starting Code

Here’s some initial starting code to get you started, which, given a hash of %counts, prints out a formatted report.

Output Report with %count hash

For those that are familiar with the C-Language printf, this should not be unfamiliar. Perl also has a repetition operator x to repeat several characters.

What might be a surprise for those unfamiliar with Perl is to express a loop in a single line (line 10), which can be unfolded into this below:

foreach (sort keys %counts) {
printf "%-17s %3d users\n", $_, $counts{$_}
}

Other language features include using $_ as a default index, when one is not defined, and using a space instead of parenthesis to pass parameters. The same above, could be rewritten as this:

foreach $shell (sort keys %counts) {
printf("%-17s %3d users\n", $shell, $counts{$shell})
}

The Solutions

Unlike the previous articles I wrote for Ruby and Python, I decided to put this in a single article. I assume most of the audience is looking at this article out of curiosity to see how Perl compares to either Ruby or Python.

Solution 1: Slicing Shell First

Here’s a common method to open a file, check for an error, and then use the spaceship operator <> to return a list to a conditional while loop.

Each iteration of the loop will have the line available as the variable $_.

Slice out the Shell

Each cycle of this loop, we chomp off the newline character, split the line into a list, and then slice off the 7th item (indexed by 6), which represents the shell.

With this single line (line 8):

$counts{$shell}++ if defined $shell

We do the following:

  1. Determine if we have actually have a shell (not undefined value)
  2. Reference the hash by the shell, returning a 0 if it is not set in a previously.
  3. Increment its value by 1

Perl is kind enough to use an intelligent default of 0 if we have not initialized the value. This saves unnecessary branch logic to check if key exists before, and initialize it to a default.

Solution 2: Regex Filter

This is similar to the problem before, except that we check to see if we have a valid string first using a regex (regular expression) to filter out invalid lines. The default scratch variable $_ is used here to represent the line.

Regex to filter out invalid lines

Because we filter out invalid lines that do not have a shell specified, we can just slice out the shell key and both create the key-value pair as well as increment it in one line, essentially doing at least four operations in one line:

$counts{(split /:/)[6]}++ if ($_ !~ /:$/);

This would be the same as this:

if ($line !~ /:$/) {
my $shell = (split($line,/:/))[6];
$counts{$shell}++;
}

Solution 2: Functional Style

Instead of using a conditional loop with this solution, we use a pipeline to feed into grep and map.

Grep and Map

This will run the following pipeline logic:

lines_from_file | filter_valid_lines | extract_shell | build_hash

Alright, so this is something truly magical, and may need some explaining. We’ll dissect each link of this pipeline, as if it was written like this:

my @valid_lines = grep { $_ !~ /:$/  } (<PASSWD_FILE>);
my @shell_list = map { chomp; (split /:/)[6] } @valid_lines;
%counts = map { $_ => $counts{$_}++ } @shell_list;

The first part of the link, reading from back to front, is the filtering out bad lines (that is ones that do not have a 7th column and end with a colon):

@valid_lines = grep { $_ !~ /:$/ } (<PASSWORD_FILE>};

The next pipeline link transforms the incoming items to just the shell:

@shell_list = map { chomp; (split /:/)[6] } @valid_lines;

For the Perl split, it is using the default $_ fed into this code block, and thus could be rewritten as this to be explicit:

@shell_list = map { chomp; (split(/:/,$_))[6] } @valid_lines;

In the last link, we output a hash by specifying a key-value pair, instead of a regular element.

Perl will see that you are using key-value pair and know this map is producing a hash, and not a list. In other languages, you would need to explicitly coerce it to a hash (or dictionary).

Thus we are essentially doing this:

my %hash = map { $key => $value } @list

Here are some examples of using this method:

my %squares = map { $_ => $_**2 } @numbers
my %capitals = map { $_ => uc $_ } @words

So in our example, we produce $key=>$value pairs and send them to our %counts hash:

%counts = map { $_ => $counts{$_}++ } @shell_list;

But wait, there’s something else interesting happening here, see if you can catch it?

Here, map is dynamically building a hash as it is processing, which overwrites the %counts hash.

The Conclusion

I wanted to give people a sense of the richness of Perl and show how it may compare to Ruby and Python for similar problems, as well as offer an introduction to Perl.

These are some general takeaways for the language and this tutorial:

  • Variables in Perl are $scalar, @list, or %hash and will default to 0 or empty string (in case that was not obvious).
  • A scratch $_ scalar variable is used in looping constructs and is the default variable input for many functions or operators in Perl.
  • Formatted output with printf and repetition operator x
  • Opening a file with open and handling errors with die
  • Spaceship operator <> to generate lines from a file handle
  • Splitting strings into lists with split
  • Slicing off list elements
  • regex match operators with =~ and !~
  • filtering elements with grep
  • transforming elements with map
  • building hashes with map