import pandas as pd: Quick Tips to work with your Data

Published in

Spectrum Labs

2 min readDec 31, 2020

When I first started using Python for data analysis, I was exposed to the pandas library for performing Exploratory Data Analysis (EDA). Pandas is an easy-to-use and extensive Python library which provides a variety of functions for most tasks involving data.

As an amateur in the Data Science field, I initially learnt some common functions that pandas has to offer, at least enough to perform basic operations on my datasets. However, the need to perform complex tasks prompted me to explore the library a bit more. That’s when I found some useful ways of doing the same tasks using pandas.

Most of the tips that I am sharing here are quite self-explanatory and should be helpful. Check them out! You never know, you might find something that you didn’t know yet!

General Tips

Dropping null/missing values from data-frame

>>> df.dropna(subset=['col1', 'col2'], inplace=True) 
OR>>> df = df[pandas.notnull(df['col'])]

Converting datatype of a column to numeric

>>> df['col'] = pd.to_numeric(df['col'], errors='coerce')
OR>>> df['col'] = df['col'].astype('int32')
OR>>> df = df.astype({'col1': 'int32', 'col2': 'int32'})

Unique values in a column and the count

>>> df['col'].nunique()   -> gives a count of unique values>>> df['col'].value_counts()  -> gives total count of each unique value

Dropping duplicates

>>> df.drop_duplicates(subset=['col'], keep='first', inplace=True)

Ways to add new column to existing dataframe

By assigning a default value to all rows

>>> df['new_col'] = 'default_value'
OR>>> df.insert(loc=2, column='new_col', value='default_value')

2. By performing computations on existing columns

>>> df['new_col'] = df['col1'] * df['col2']>>> df = df.assign(new_col = lambda x: (x['col1']/x['col2'])*100)>>> df['new_col'] = df.apply(lambda x: (x['col1']/x[''col2'])*100, axis=1)

Working with text data using Pandas

To display complete text in a column: This option in pandas allows to display the entire contents of a column. It’s helpful when you work with text data.

>>> pandas.set_option('display.max_colwidth', -1)

To display a fix number of rows: Using this option eliminates the need of adding extra code (for example, use of .head()) to print a fixed number of rows in the output.

>>> pandas.set_option('display.max_rows', 10)

Matching a pattern in text column

>>> df['text_col'].str.contains('_pattern_', na=False)>>> df['text_col'].str.match('_pattern_', na=False)>>> df['text_col'].str.contains('_regex_pattern_', na=False, regex=True)

Replace a substring in text with another value

>>> df['text_col2'] = df['text_col_1'].str.replace('value1', 'value2')

Avoid using for loops!

Use apply

>>> df['words'] = df.apply(lambda x: x['text_col'].lower().split(), axis=1)OR>>> def text_to_words(x):
       x = x.lower()
       return x.split()>>> df['words'] = df['text_col'].apply(text_to_words)

Use map

>>> df['label_num'] = df['label'].map({'absent': 0, 'present': 1}, na_action='ignore')

import pandas as pd: Quick Tips to work with your Data

General Tips

Ways to add new column to existing dataframe

Working with text data using Pandas

Avoid using for loops!

Written by Devanshi Bhatt