Data Preparation and Data Analysis with Pandas, NumPy and Matplotlib in Python

Saurabh Ghosh
Predict
Published in
2 min readDec 14, 2022

Let’s learn data preparation and analysis techniques in Python using sample data about diets and recipes

Photo by S O C I A L . C U T on Unsplash

What to expect in this blog?

You’ll be exploring the below steps in this blog -

  1. Collecting data — There are many ways of collecting data. In this blog, data is read from a CSV file with Pandas.
  2. Identifying missing data and treating missing data (if any) — Before using the collected data for any reporting or modelling, it must be checked for missing values.
  3. Exploring the data to check for imbalances, and discrepancies — Data must be checked for issues e.g. imbalances between categories, outliers among the data, and deviations from the assumption/data understanding.
  4. Transforming data to treat the imbalances, and discrepancies — The issues found in the data must be addressed before it is used for reporting or modelling.
  5. Creating new derived features — As part of data analysis, new features/labels can be created to give better insight or help with the analysis.
  6. Using visualization to assist in the analysis — Visual representation always helps to analyze or explore the data easily.

Some key programming points you’ll be learning in this blog -

  1. Read from CSV with Pandas
  2. Checking unique values
  3. Checking count
  4. Checking null values
  5. Grouping of data
  6. Plotting histogram chart
  7. Assigning chart titles and axis labels
  8. Plotting multiple subplots
  9. Creating a new derived column
  10. Dropping a column
  11. Creating pivot table
  12. Plotting stacked bar chart
  13. Filtering rows based on condition

Let’s code

Data preparations and analysis programming is done sequentially with each step based on the outcome of the previous step. So, you’ll use Jupyter-lab for this coding.

That’s all the analysis for now! The data looks ready.

Download

GitHub — https://github.com/SaurabhGhosh/diet_data_analysis.git

Conclusion

In this blog, I hope you got some ideas about below -

  1. Read from CSV with Pandas
  2. Checking unique values
  3. Checking count
  4. Checking null values
  5. Grouping of data
  6. Plotting histogram chart
  7. Assigning chart titles and axis labels
  8. Plotting multiple subplots
  9. Creating a new derived column
  10. Dropping a column
  11. Creating pivot table
  12. Plotting stacked bar chart
  13. Filtering rows based on condition

In my next blog, I’ll explore another program and learn more concepts.

If you have any questions related to this program, please feel free to post your comments.

Please like, comment and follow me! Keep Learning!

--

--

Saurabh Ghosh
Predict
Writer for

Business Analyst, Machine Learning Enthusiast, Blogger