Exploratory Data Analysis(EDA)on Biological Data: A Hands-On Guide

Unraveling the Structural Data of Proteins, Part II — Exploratory Data Analysis

Shreya U. Patil


Photo from Pexels

In a previous post, I covered the background of this protein structure resolution data set, including an explanation of key data terminology and details on how to acquire the data. Check out that post, here Explore unique dataset for your upcoming data science project .

This article provides an intuitive guide for exploratory data analysis(EDA) on a real-world protein structure data set, aimed at beginners looking to get hands-on experience with a practical data analysis project. This type of data is used in predict the resolution of a protein structure in drug design procedures.

EDA is one of the most crucial first steps in any data science project. However, many beginners make the mistake of jumping directly into modeling and prediction without taking the time to deeply understand their data.

EDA provides invaluable insights that allow you to design your machine learning pipeline more effectively. By getting to know your data inside and out, you can make better decisions about data cleaning, feature engineering, model selection, and evaluation metrics.

For example, in a binary classification task…

