Member-only story
Data Science
Visualizing and Analyzing Proteins in Python
Proteomics with Biopython
Human biology is incredibly complex. Even with our ever-growing understanding, our answers only uncover more and more questions. The completion of the Human Genome Project gave many scientists confidence that we could solve pressing issues in biology through genomics. However, as our understanding of biology has grown, we’ve recognized that other factors influence how an organism’s genome is utilized. Thus, new fields of study were born to address these interconnected and flexible domains, including transcriptomics (study of mRNA) and proteomics (study of proteins).
As I covered in my previous blog, the Biopython package is quite powerful and can visualize and analyze DNA and RNA sequences simply. And it has protein analysis capabilities, too! So let’s dive in.
The Protein Data Bank is a one-stop shop for exploring and downloading protein sequences. PDB developed its own file format for this purpose —the aptly named, .pdb. But as larger, more complex proteins were analyzed another format was developed — CIF and mmCIF. CIF (Crystallographic Information File) was developed to archive small molecule crystallographic experiments studying the arrangement of atoms in crystalline solids. CIF was expanded to larger molecules…