Member-only story

Genome Assembly — The Holy Grail of Genome Analysis

Assembling the 2019 novel coronavirus genome

Vijini Mallawaarachchi
Towards Data Science
8 min readMar 4, 2020

--

The 2019 novel coronavirus or coronavirus disease (COVID-19) outbreak has threatened the entire world at present. Scientists are working day and night to understand the origin of COVID-19. You may have heard the news recently that the complete genome of COVID-19 has been published. How did scientists figure out the complete genome of COVID-19? In this article, I will explain how we can do this.

Genome

A genome is considered as all the genetic material, including all the genes of an organism. The genome contains all the information of an organism that is required to build and maintain it.

Sequencing

How can we read the information present in the genome? This is where sequencing comes into action. Assuming you have read my previous article on DNA analysis, you know that sequencing is used to determine the sequence of individual genes, full chromosomes or entire genomes of an organism.

Fig 1. A PacBio sequencing machine. PacBio is a third-generation sequencing technology which produces long reads. Image by KENNETH RODRIGUES from Pixabay (CC0)

Special machines, known as sequencing machines are used to extract short random sequences from the genome we are interested in. Current sequencing technologies cannot read the whole genome at once. It reads small pieces of mean length between 50–300 bases…

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Vijini Mallawaarachchi
Vijini Mallawaarachchi

Written by Vijini Mallawaarachchi

Bioinformatician | Computational Genomics 🧬 | Data Science 👩🏻‍💻 | Music 🎵 | Astronomy 🔭 | Travel 🎒 | vijinimallawaarachchi.com

Responses (1)