Genes, DNA, Chromosomes, and Genomes

Hear these words a lot but not quite sure what they are and how they relate to each other?

Michael Hall
PhD Files
7 min readOct 6, 2017

--

In order to write about my PhD for a broader audience I thought it best to write a series of posts that will act as references. It’s best to make sure everyone is on the same page before I charge off and start throwing around terms like prokaryote, genomics and sequence alignment.

This post in particular will focus on giving the reader a better understanding of what DNA is, how it relates to a gene, how it differs from RNA, and many other terms. Obviously, if you’re very comfortable with these concepts there is no need to read on. However, if you couldn’t articulate to your grandmother what the difference between a chromosome and a gene is I would advise you to stick around – if not for your sake, at least for your grandmother’s.

Let’s get into it then!

DNA is short for DeoxyriboNucleic Acid, whilst RNA is short for RiboNucleic Acid. We will get into the difference between these two a little later.

Whilst knowing what these stand for seems cool, it doesn’t really tell you anything about what their function is. To start to get at the role of DNA, take the most overused company slogan:

[Insert latest business buzz-word]. It’s in our DNA.

This hints at DNA being a core component of something. If company X says that innovation is in its DNA, we can assume that any training of employees focuses heavily on innovation. In this scenario DNA refers to the training manual. Not far from the mark!

At a very high level, DNA is like the construction manual for how to build a cell. Cells belonging to different organisms will have different instructions. It is a template or blueprint that gets copied and passed on to every cell within that organism. While this all sounds very grandiose, DNA in its simplest form is actually just a molecule – called a nucleic acid.

Nucleic acids are composed of single units called nucleotides. By connecting a lot of these nucleotides together in a big chain we start to see the magic happen. Nucleotides come in 4 flavours: Adenine, Guanine, Cytosine, and Thymine. However, most people refer to them simple as A, C, G, and T. So connecting 10 nucleotides together we could get a nucleic acid (DNA) sequence of ATGCGTACAA or even GGGGTGGGGG. Another bit of terminology you may come across to describe nucleotides is base-pairs or bases.

DNA nucleotides and how they pair up in the double-helix

DNA mostly exists in “double-stranded” form. This is sometimes famously referred to as the double-stranded helix. What this means is that two “chains” of nucleotides are bound to each other. This binding happens in a very specific order though. A will always bind to T, whilst C will always bind to G – the reverse is also true. So if you have a strand of DNA with the sequence ATGG it will be bound to another strand of DNA with the sequence TACC. Two reasons for this are stability and information conservation. The double-stranded structure makes it much harder for other molecules to bind to the DNA, which could potentially have a disruptive effect on their function and order. The information conservation ties in with the copying of DNA, which will will touch on later.

By now we have established that DNA is a type of blueprint that stores information on how to build a cell. This is obviously a very high-level definition, so let’s start digging in a little further and see how this information is organised and used.

“Gene” is another term that gets thrown around in everyday life, yet I suspect many people don’t actually know what a gene is. Genes can be thought of as a chunk of information within a DNA strand. A single DNA strand – consisting of up to millions of nucleotides linked together – might contain thousands of these genes. Let’s keep things simple for now though and imagine a single strand of DNA, containing one gene. This gene encodes (holds) the information to make a protein. (Proteins are a whole separate post, but for now, think of them as the functional building blocks that make up a cell). Let’s say the protein this gene encodes is a cell-surface receptor for insulin. This protein sits in the wall of the cell and binds to insulin (a hormone that increases in concentration when your body needs sugar) when it passes by. Once this happens the receptor sets off a chain of communication within the cell telling it that it needs to start consuming glucose. Now, these details are not necessary to understand what a gene is, but they highlight the importance of proteins.

So taking a step back, a gene is a unit within a DNA strand that holds the instructions to make a protein. As you can probably imagine, we have lots of genes because we need lots of different proteins to make a cell – and eventually an organism such as an elephant. Say there is a gene whose sequence of nucleotides is GTGCCA (in reality genes are much longer than this). If there is a DNA sequence CGGATTGTGCCAACCTC we can see how this idea of genes being like units within DNA strands looks and how genes relate to DNA.

A birds-eye view of how genes, DNA, chromosomes and cells relate to each other by size.

While we are on the topic of genes we should talk about RNA. Going from a gene to a protein is not a single step and RNA acts as a sort of middleman. When a cell needs to make a protein there are special proteins that come along and turn the DNA sequence of the gene into RNA (this is referred to as transcription). RNA, like DNA, has an alphabet of nucleotides – with one difference. RNA has an A, C, and G, but instead of T it has a nucleotide called Uracil. So a gene with the DNA sequence GTGCCA would become the RNA sequence GUGCCA.

The resulting RNA strand from this transcription process is then translated into protein by some other protein machinery (ribosomes). Proteins are made of small units called amino acids, they are very different to nucleotides and there are many more “letters” in the protein alphabet. The way that nature has solved this issue is by reading the RNA in groups of three. These groups of three – called codons – are how DNA/RNA store the information for making proteins. Each codon corresponds to an amino acid. For example, the codon AAG gets translated as the amino acid Lysine, whilst GUA gets interpreted as Valine. There are circumstances where this flow of DNA to RNA to Protein does not proceed in that order, but the majority of cases do play out that way.

DNA in use. Transcription and translation (in most circumstances).

Now we come to chromosomes. Chromosomes are another word associated with DNA which you hear quite frequently in the media. Let’s clear it up so the next time you hear it, you will have an understanding of what it is and why it is important. Chromosomes are effectively just really long, single pieces of DNA. So, in a single chromosome you can have thousands of genes. For example, chromosome 1 in humans contains 1,961 genes and has a whopping 248,956,422 bases (nucleotides – the single units we learnt about earlier)! Chromosomes vary significantly in size within organisms and also across organisms. Humans normally have 46 chromosomes – 2 copies of chromosomes 1–22 plus 2 X-chromosomes if you’re a female, or 1 X-chromosome and 1 Y-chromosome if you’re a male. A chicken, however, has 78 chromosomes.

Chromosome structure is very fascinating – you need to squeeze these huge bits of DNA into cells – but there isn’t time to discuss that here. If you are interested, the below video shows how chromosomes fold in order to fit into a cell.

The last bit of terminology I will cover, as it is quite fundamental to my PhD, is a genome. An area of medicine that is becoming increasingly present in the news and healthcare system is genomics. Genomics, as you can probably guess, is the study of genomes. And what are genomes? Essentially they are the collection of all genetic material in an organism/cell (i.e all the chromosomes). Genomics is concerned with the wholistic view of DNA in a cell, rather than focusing in on a single point. I will talk at length and in far more depth about this in the next post.

So there you have it. DNA.

There is so much more detail and interesting bits and pieces I could talk about but I don’t want this post to be too long. In my future posts I will be exploring and elaborating on these concepts and I will introduce more as it is required. For now though, this will be sufficient for understanding some of the concepts that will be a part of my PhD thesis. My next posts will begin to get into some of the details on this and I hope this post will act as a good guide.

I hope this article was informative and gave a good conceptual understanding of how genes, chromosomes, DNA, RNA, and proteins all relate to each other. If you have any questions feel free to leave them in the comments or yell at me if you think I am wrong about something.

--

--

Michael Hall
PhD Files

PhD Student in Bioinformatics — University of Cambridge and EMBL-EBI.