Ops Scripting w. Python: Frequency 2

Tracking Frequency in Python: Part II

Previously I presented the problem on how to use count frequency using a frequency hash, or dict in Python.

In this article, I will present solutions and discussion about Python language features.

The Solutions

These solutions will use a collection loop for. I call them collection loop as the loop construct iterates over a collection, which in our case a collection of lines from the passwd file.

Solution 1: Basic Collection loop

We open the file and iterate line by line in this example. To keep things simple, we will not handle errors:

For every line, we only care about the shell and without a newline character polluting our string. So we do a few operations, strip off the newline, split the string into a list, and a list slice. This can be broken up into these steps:

line = line.rstrip()         # strip newline
line_items = line.split(':') # split up line by ':' divider
shell = line_items[6] # slice off 7th item

This can all be done in a single line.

shell = line.rstrip().split(':')[6]

Now that we have a have the shell, we need to check if we actually got a shell. Sometimes, though rarely, there may not actually be a shell defined for that user.

if shell:
# do stuff with that shell as a key

Each item in the counts dictionary will have a key that represents the shell, and a value that represents frequency of shell used in our data file passwd.

We simply need to increment the value. As Python does not initialize values when first used, we have to do this manually.

if shell in counts:
counts[shell] += 1 # increment existing value
else:
counts[shell] = 1 # initialize value first time

We could write the same branch logic in one line:

counts[shell] = 1 if shell not in counts else counts[shell] + 1

Solution 2: Dict get Method

Instead conditionally setting the frequency count, we can use the get method that comes with the dict class. This will return a default value if the key is not found, which should be 0, or it will return the value. Either way, we increment the value by one to increase the count.

Solution 3: DefaultDict

Another method is to just auto-initialize all keys that are referenced for the first time to 0 with dict subclass called defaultdict from the collections library.

With this, python now behaves like other languages, but is more powerful as we can control the behavior of the default with a custom lambda.

Conclusion

From these solutions, the you should have picked up the following takeaways for Python:

  • Collection Loop (for)
  • Splitting a String
  • List Slicing (or indexing in this case)
  • Testing variable is initialized
  • 3 ways to initialize default value in dict class

In the next article, I will show how to use lambda and dict comprehensions to solve the same problem.