import pandas as pd: Quick Tips to work with your Data

Devanshi Bhatt
Spectrum Labs
Published in
2 min readDec 31, 2020

When I first started using Python for data analysis, I was exposed to the pandas library for performing Exploratory Data Analysis (EDA). Pandas is an easy-to-use and extensive Python library which provides a variety of functions for most tasks involving data.

As an amateur in the Data Science field, I initially learnt some common functions that pandas has to offer, at least enough to perform basic operations on my datasets. However, the need to perform complex tasks prompted me to explore the library a bit more. That’s when I found some useful ways of doing the same tasks using pandas.

Most of the tips that I am sharing here are quite self-explanatory and should be helpful. Check them out! You never know, you might find something that you didn’t know yet!

General Tips

  • Dropping null/missing values from data-frame
>>> df.dropna(subset=['col1', 'col2'], inplace=True) 
OR
>>> df = df[pandas.notnull(df['col'])]
  • Converting datatype of a column to numeric
>>> df['col'] = pd.to_numeric(df['col'], errors='coerce')
OR
>>> df['col'] = df['col'].astype('int32')
OR
>>> df = df.astype({'col1': 'int32', 'col2': 'int32'})
  • Unique values in a column and the count
>>> df['col'].nunique()   -> gives a count of unique values>>> df['col'].value_counts()  -> gives total count of each unique value
  • Dropping duplicates
>>> df.drop_duplicates(subset=['col'], keep='first', inplace=True)

Ways to add new column to existing dataframe

  1. By assigning a default value to all rows
>>> df['new_col'] = 'default_value'
OR
>>> df.insert(loc=2, column='new_col', value='default_value')

2. By performing computations on existing columns

>>> df['new_col'] = df['col1'] * df['col2']>>> df = df.assign(new_col = lambda x: (x['col1']/x['col2'])*100)>>> df['new_col'] = df.apply(lambda x: (x['col1']/x[''col2'])*100, axis=1)

Working with text data using Pandas

  • To display complete text in a column: This option in pandas allows to display the entire contents of a column. It’s helpful when you work with text data.
>>> pandas.set_option('display.max_colwidth', -1)
Before
After
  • To display a fix number of rows: Using this option eliminates the need of adding extra code (for example, use of .head()) to print a fixed number of rows in the output.
>>> pandas.set_option('display.max_rows', 10)
  • Matching a pattern in text column
>>> df['text_col'].str.contains('_pattern_', na=False)>>> df['text_col'].str.match('_pattern_', na=False)>>> df['text_col'].str.contains('_regex_pattern_', na=False, regex=True)
  • Replace a substring in text with another value
>>> df['text_col2'] = df['text_col_1'].str.replace('value1', 'value2')

Avoid using for loops!

  • Use apply
>>> df['words'] = df.apply(lambda x: x['text_col'].lower().split(), axis=1)OR>>> def text_to_words(x):
x = x.lower()
return x.split()
>>> df['words'] = df['text_col'].apply(text_to_words)
  • Use map
>>> df['label_num'] = df['label'].map({'absent': 0, 'present': 1}, na_action='ignore')

--

--