omkar katare

Pandas is a great way cleaning , handling missing data, merging and joining, visualizing, grouping of datasets.

We often need to manipulate a whole column in a dataset in a particular way such as performing certain mathematical operations over the whole column.

# Suppose we want to convert the column network to uppercase>>>def to_uppercase(column):       return column.upper()>>>df[‘Capitalized_title’] = df[‘title’].apply(to_uppercase)

In above example we have applied function ‘to_uppercase’ over the column ‘title’ . This causes all row elements of the column to be passed through the function as argument.

Defining a lambda function and applying over a whole column-

# lambda function — Row wise operation

def func_name(a, b):
if condition1:
return some_variable
df[‘new_col’] = df.apply(lambda x: func_name(x[‘col1’], x[‘col2’]),axis=1)

Now, ‘some_variable’ will be returned for all the rows of the data frame and new column defined as ‘new_col’ will be created.