Data Preparation and Data Analysis with Pandas, NumPy and Matplotlib in Python
Let’s learn data preparation and analysis techniques in Python using sample data about diets and recipes
What to expect in this blog?
You’ll be exploring the below steps in this blog -
- Collecting data — There are many ways of collecting data. In this blog, data is read from a CSV file with Pandas.
- Identifying missing data and treating missing data (if any) — Before using the collected data for any reporting or modelling, it must be checked for missing values.
- Exploring the data to check for imbalances, and discrepancies — Data must be checked for issues e.g. imbalances between categories, outliers among the data, and deviations from the assumption/data understanding.
- Transforming data to treat the imbalances, and discrepancies — The issues found in the data must be addressed before it is used for reporting or modelling.
- Creating new derived features — As part of data analysis, new features/labels can be created to give better insight or help with the analysis.
- Using visualization to assist in the analysis — Visual representation always helps to analyze or explore the data easily.
Some key programming points you’ll be learning in this blog -
- Read from CSV with Pandas
- Checking unique values
- Checking count
- Checking null values
- Grouping of data
- Plotting histogram chart
- Assigning chart titles and axis labels
- Plotting multiple subplots
- Creating a new derived column
- Dropping a column
- Creating pivot table
- Plotting stacked bar chart
- Filtering rows based on condition
Let’s code
Data preparations and analysis programming is done sequentially with each step based on the outcome of the previous step. So, you’ll use Jupyter-lab for this coding.
That’s all the analysis for now! The data looks ready.
Download
GitHub — https://github.com/SaurabhGhosh/diet_data_analysis.git
Conclusion
In this blog, I hope you got some ideas about below -
- Read from CSV with Pandas
- Checking unique values
- Checking count
- Checking null values
- Grouping of data
- Plotting histogram chart
- Assigning chart titles and axis labels
- Plotting multiple subplots
- Creating a new derived column
- Dropping a column
- Creating pivot table
- Plotting stacked bar chart
- Filtering rows based on condition
In my next blog, I’ll explore another program and learn more concepts.
If you have any questions related to this program, please feel free to post your comments.
Please like, comment and follow me! Keep Learning!