What’s The Difference Between Lambda Functions And List Comprehensions?
One of the most time-consuming tasks of a data scientist (or, really, just someone who is seeking to leverage data in some meaningful way and is the one handling the data from the outset…this isn’t exclusive to someone with the “title” of data scientist) is data preparation.
In the majority of cases — before doing any modeling, building training sets or a number of other steps— data must first be collected, cleaned and organized.
Take for instance the below DataFrame which has historical price information for Litecoin, one of the “oldest” cryptocurrencies launched in 2011 by former Google engineer Charles Lee as a “lite” version of Bitcoin:
Let’s say we wanted to clean up the entries in the “Time Period Start” column, which in this case could refer to getting the specific date nicely visible, without all of the zeros and without the time information.
One way to do this is to create an entirely new column, loop through each entry, and pull only the date (i.e. in row 0 for the Time Period Start column it would be 2021–05–17).
Now, this could be achieved with a for loop, but instead I chose to use a list comprehension:
ltc_series_sorted['period_start'] = [i[0:10] for i in ltc_series_sorted['Time_Period_Start']]
List comprehensions are often faster than for loops and require less code.
When completing a data science project, being able to do a task faster and with less code is significant.
Using the above code on the “Time Period Start” and “Time Period End” columns, and then removing all other unneeded columns led to this cleaned up DataFrame (you can also see in the “wk_price_change” column the result of some feature engineering I completed):
An alternative to using list comprehensions, however, is using Lambda functions, along with functions such as apply() or map().
Lambda functions — as far as I understand — can be thought of as forever in the moment functions that are only needed where they have been created. This is in contrast to a traditional function, which we would normally create so that we can actually use it at a later point.
More traditional functions in Python are created for use at a later point. Lambda functions are created for use just at the moment.
And whereas a list comprehension outputs only lists, Lambda functions can be used to output lists or values. One can even use the more normal functions (where the function is defined) within Lambda functions.
Below is an example of a Lambda function used along with apply() when the task was to conduct some NLP pre-processing on crypto publication headlines by removing stopwords.
df_testr['step_one_process'] = df_testr['headline'].apply(lambda x: ' '.join([word.lower() for word in x.split()
if word.lower() not in stop_words and word.isalpha()]))
Stopwords are common words in a given language that may not add much value to analyzing and performing modeling on respective text.
We can see the result of the Lambda function used along with apply() here:
Can you see one word that was removed on multiple occasions?
The word and has been removed from rows 0, 1 and 3, as well as other words were removed from the DataFrame.
Both list comprehensions and lambda functions offer better performance than for loops from what I can tell.
But what’s the difference between the two?
The difference between List Comprehensions and Lambda Functions seems to be largely a matter of preference, really.
But, in the case of wanting to leverage the use of a function in a moment, without defining it, it appears lambda functions are very useful.
Upon doing research on their differences, it seems many prefer list comprehensions because they are generally more readable.
I tend to agree.
However, lambda functions are similarly useful.