Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Data Science

Visualizing and Analyzing Proteins in Python

Proteomics with Biopython

8 min readFeb 27, 2021

--

Human biology is incredibly complex. Even with our ever-growing understanding, our answers only uncover more and more questions. The completion of the Human Genome Project gave many scientists confidence that we could solve pressing issues in biology through genomics. However, as our understanding of biology has grown, we’ve recognized that other factors influence how an organism’s genome is utilized. Thus, new fields of study were born to address these interconnected and flexible domains, including transcriptomics (study of mRNA) and proteomics (study of proteins).

GIF by Author

As I covered in my previous blog, the Biopython package is quite powerful and can visualize and analyze DNA and RNA sequences simply. And it has protein analysis capabilities, too! So let’s dive in.

The Protein Data Bank is a one-stop shop for exploring and downloading protein sequences. PDB developed its own file format for this purpose —the aptly named, .pdb. But as larger, more complex proteins were analyzed another format was developed — CIF and mmCIF. CIF (Crystallographic Information File) was developed to archive small molecule crystallographic experiments studying the arrangement of atoms in crystalline solids. CIF was expanded to larger molecules…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Aren Carpenter
Aren Carpenter

Written by Aren Carpenter

Data. BI. Goldman Sachs. 150k readers

Responses (2)