Top 10 python tips to make our life easy in Data Analysis

Tricks for quickly summarization and styling data for Data Analysis

Kashish Rastogi
Nerd For Tech
6 min readJun 13, 2021

--

Photo by Stephen Dawson on Unsplash

Whatever looks good has always the highest price, as Content matters 80% + looks matter 20%. Tricks and tips are always best to collect to make work more efficient and easy. Minor shortcuts can work as a booster to your work. Some know and some unknown tricks are shown with code and examples below

Data

1. Crosstab

“Compute a simple cross-tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.” I personally find crosstab function more useful.

Crosstab of multiple indexes and a column. It shows the total count of each rows & columns by setting margin=True, we can also change the column name by margin_name

crosstab provides values parameters to 3 numerical values to aggregate on. Setting some additional functionality by replacing nan values with 0 & rounding the value to 2 decimals.

The most popular parameter is normalize which accepts these options.

a) If passed normalize = True or all, will normalize over each value.

b) If passed normalize = index(rows), will normalize over each rows.

c) If passed normalize = columns, will normalize over each columns.

2. Styling

a) Tables

Have you ever wonder just looking at the table it’s plain simple good but not great we just need to spicy up things & our savior is CSS. for more examples look at attaching the link.

If you want to make the table interactive then switching to plotly(Library) is a great option.

b) Markdown

Markdown to make jupyter notebook fun.

3. Working with text data

a) Extract

Extract function is used when we need to extract words or digits from the text.

b) lower(), upper()

pandas provide lower & upper functions so we can directly convert text data into lowercase & uppercase simultaneously.

4. Memory Usage

Memory usage is used to check how much memory is being used by columns in the data frame. It is most useful while working with Deep Learning & Machine Learning Algorithm when we need to train a model.

5. Option

Readers have encountered this issue when there are many columns some of middle the columns get omitted.

columns containing long text get truncated. Used when working with text data where data is too large to fit.

columns having float datatype also get truncated when they have too many digits after the decimals.

6. Groupby

Groupby function involves a combination of splitting, applying a function, & combining the results. Mainly used to group a large amount of data & perform functions on them.

If you don’t like this alignment of columns there is always room for change. Using the reset index we can change the alignment.

7. Listing all unique values in a group

Getting the list of unique values.

8. other functions

a) cumulative sum

cumulative sum gives cumulative sum for each group.

b) squeeze

This method is most useful when you don’t know if your object is a Series or Data Frame, but you do know it has just a single column. In that case, you can safely call squeeze to ensure you have a Series.

c) Sample

Sample method allows you to select values randomly from a Series or Data Frame. It is useful when we want to select a random sample from a large data.

9. Finding unique values

nunique counts unique values over columns & rows. Mostly used when we have categorical features where the unique values are too many to count manually.

10. Pandas profiling

you can find the notebook here and play around.

You can contact me here

Linkedin | Kaggle | Blog

--

--

Kashish Rastogi
Nerd For Tech

Data Analyst | Data Visualization | Storyteller | Tableau | Plotly