Ops Scripting w. Python: Frequency 3

Tracking Frequency in Python: Part III

This is a continuation of a series on how to do frequency dictionary in Python. The original problem is presented in this article:

Previously I also presented how build the dict as we process each line of a file:

In this article, we’ll demonstrate how to do this in more serial way, first extracting a list of shells, and then counting their frequency.

The Solutions

In previous solutions, we built the data structure, the counts dictionary as we went through each line. This time, we will do it more serially, to where we’ll create a list of shells called shell_list, and then build our counts dictionary from that.

Solution 4: Two Collection Loops

Collection Loops

In this solution, we strip off the newlines for each line by using the splitlines() method. This will return a list of lines without the newlines to the collection loop.

Instead of splitting first, we’ll first check if we have a valid line using a regular expression. In this expression “.*[^:]$”, we’ll verify that the line does not end with a colon : divider. If it does, then it means there is no shell specified for the 7th column.

With a valid line, we can can slice of the 7th column and append this to our shell_list:

shell_list.append(line.split(':')[6])

Now we have a list of shells, we can build the counts dictionary, but first creating a unique list of shells with a set, set(shell_list) and later with each shell count the number of shells in the list, shell_list.count(shell). The result is used to build the counts dictionary:

for shell in set(shell_list):
counts[shell] = shell_list.count(shell)

Solution 5: Comprehensions

For this solution, we’ll use list comprehensions (and dict comprehensions). See About Comprehensions below for more information.

List and Dict Comprehensions

In initial list comprehension for shell_list, we feed only valid lines with each line, we slice of the 7th column from a colon delimited line:

[line.split(':')[6] for line in lines_from_file if valid_line]

In our dict comprehension, we use the same logic that we used in Solution 4:

{ shell: count_of_shells for shell in unique_shell_list }

Solution 6: Lambdas with Map and Filter

In this solution, we’ll use a functional approach using map() and filter(). We’ll pass our own lambda expression as a parameter to these functions. See the section below About Lambdas for more information on these.

Map and Filter with Lambdas

This first part is to create a list of shells, which will do by using a map to transform each element:

map(lambda line: line.split(':')[6], list_of_valid_lines)

To create a list of valid lines, we will use a filter:

filter(lambda line: re.match('.*[^:]$',line), lines_from_file)

Now that we crafted our shell list, we can feed this into another map to create a map. We’ll use a little trick to coerce a list into a dict, to essentially do this below:

dict(map(lambda shell: [shell, count_of_shells], unique_shell_list)

We create our key and value pares the same way we did in Solution 4 and Solution 5:

map(lambda shell: [shell, list.count(shell_list)], set(shell_list)

About Lambdas

For those unfamiliar with lambdas, they are essentially a small anonymous function. Here’s an example of a function and lambda:

def cube(y):
return
y*y*y;
c = lambda x: x*x*x
print(c(5))     # 125
print(cube(5)) # 125

Lambdas can be passed as a value a function. Python includes functions that use this to facilitate a functional programming approach: map(), filter(), and reduce(), where you would pass a lambda expression and a list to feed into these functions.

Here are some examples:

squares = list(map(lambda f: num**2, numbers))
capitals = list(map(lambda w: word.capitalize(), words))
positives = list(filter(lambda x: x > 0, numbers))
evens = list(filter(lambda x: x % 2 == 0, numbers))

You can create dictionaries as well through coercion, and returning a list from the lambda.

my_dict = dict(map(lambda f: [key,value], some_list))

This returns an anonymous list, which is changed into a key and value pair using dict(list). Ultimately, this allows us to use map to create a dict.

Some examples:

squares = dict(map(lambda f: [num, num**2], numbers))
capitals = dict(map(lambda w: [word, word.capitalize()], words))

About Comprehensions

A list comprehensions (and dict comprehensions) in python is a way to create a list inline at the time of declaration. The basic structure of these look like this:

my_list = [ item for item in list ]
my_dict = [ key: value for item in list ]

You can also conditionally filter as well while building a list

myh_list = [ item for item in list if item == something ]

Here are some list comprehension examples:

squares = [ num**2 for num in numbers ]
capitals = [ word.capitalize() for word in words ]
positives = [ num for num in numbers if num > 0 ]
evens = [ num for num in numbers if num % 2 == 0 ]

We can create dictionaries using dict comprehension as well:

squares = { num: num**2 for num in numbers }
capitals = { word: word.capitalize() for word in words }

The Conclusion

List Comprehensions, Lambdas, Map, and Filter are powerful tools within Python, and may not be easy to master when first introduced to them. I hope I helped teach these to new comers, as well as show how they are applied to a common use case of building a frequency dictionary, a popular data structure in analyzing logs and other data.

These are the takeaways for this article:

  • Regular expressions to filter lines
  • Creating list of unique elements with set()
  • list Comprehensions
  • dict Comprehensions
  • Creating and Using Lambdas
  • map() and filter() Functions
  • Creating dict with map()