Ops Scripting w. Bash: Frequency 2

Tracking Frequency in BASH (Bourne Again Shell): Part II

Like other solutions in these series, I will divide this into to parts:

  • Procedural Way: show how to process each line where we slice out the shell and then build the data structure.
  • Serial Pipeline: show how to do a pipeline, where list of shells are piped into loop that simply adds to the data structure.

In this article, I’ll show how to process colon delimited files using the shell’s built-in auto-splitting mechanism with the built-in $IFS environment variable.

In a follow-up article, I’ll show how to feed data in from a sub-shell to our loop construct, using either built-in split mechanism, or an external tool.

Previous Article

The Problem

The Solutions for the Procedural Way

Solution 1: Collection Loop

In our first solution, we’ll demonstrate the collection loop, which has an auto-splitting facility. The for loop will automatically split text into parts, with the field separator specified by the built in $IFS environment variable.

The file needs to be split by newlines, so that we can process each line separately. With this, we construct our loop to be like:

IFS=$'\n'
for LINE in $(cat passwd); do
process_each $LINE
done

For each line, we have a comma separated string, so we need to split the line again into pieces with the $IFS set to a colon : this time. Bash has a way create an array by splitting a string:

IFS=: ITEMS=($LINE)

After this, we save the 7th column (indexed by 6) to a local variable. This is unnecessary, but done to make the code easier to read:

SHELL=${ITEMS[6]}

Now we need to check to see if that the shell was specified, and not simply an empty string. We we matched an empty string, we shouldn’t process it further.

[[ -z "${SHELL}" ]] || create_shell_entry

This could also be written as:

if ! [[ -z "${SHELL}" ]]; then
create_shell_entry
fi

Finally, we create our associative array entry, defaulting to 0 if the key does not yet exist:

(( COUNTS[${SHELL}]++ ))

The double parenthesis is for arithmetic operations (( arithmetic_operation )), such as increment operator ++.

Solution 2: Conditional Loop with Read

This is the most common approach, where read “will read a line from the standard input and split it into fields” (man page entry).

This is the basic construct on how to open a file and read into a variable:

while read -r LINE; do
process_line $LINE
done < input_file

We can split the line up based on the IFS (input field separator), such as a colon :, an then extract the shell as one of the array elements:

while IFS=: read -ra LINE_ARRAY; do
SHELL=${LINE_ARRAY[6]}
process_shell_item $SHELL
done
< passwd

And like before, we only create a new shell entry, only if we have a valid shell:

[[ -z "${SHELL}" ]] || create_shell_entry

And like be we use the arithmetic operator (( arithmetic_expression )) to increment the count. If we the key does not have an associated value, it will treat a blank string as 0, and increment it.

(( COUNTS[${SHELL}]++ ))

Which Solution is better?

For processing files and splitting the string, while read is typically preferred combination simply because it is less work.

Default Field Separator

Reviewing from above, compare these two:

############ for loop way ############
IFS=$'\n'
for LINE in $(cat input_file); do
### process_line ####
done
########### while loop way ###########
while read -r LINE; do
### process_line ####
done < input_file

Alternative Field Separator

And we have these two:

############ for loop way ############
IFS=$'\n'
for LINE in $(cat passwd); do
IFS=: ITEMS=($LINE)
### process_fields ####
done
########### while loop way ###########
while IFS=: read -ra ITEMS; do
### process_fields ####
done < passwd

The Conclusion

So there you have it, two ways to process files in a procedural way, and use the built-in input field separator to split a line of text.

In the next article, I’ll show how to use more serial pipeline approach, where only a list of shells are sent to the main loop.

The takeaways from this include:

  • process file through while read -r VAR; do ...; done < file loop construction
  • process a file through for VAR in $(cat file); do ...; done loop construction
  • split a string into an Bash array with read -a or with array notation ().
  • associative arrays: creating, referencing, enumerating values and keys
  • arithmetic with (( ))

Some more subtle takeaways:

  • Default input-field-separator IFS separates spaces, tabs, and newlines.
  • Need to add IFS='\n' when lines themselves need to be further split, unless using read command.