Evolution of studying evolution

Paschalis Natsidis
Nov 3 · 4 min read

Understanding how all life forms have emerged has been puzzling us for millennia. Many theories have been developed about how we came to today’s broad diversity of life. Aristotle entertained the idea that relationships among beings can be represented with a ladder, a scala, at whose top gloriously sits the human. This hierarchy was extended during the medieval times, to include divine beings at its top and non-living things at its bottom.

Explorers of life’s origins started to question Aristotle’s concepts during the 18th century. In Charles Darwin’s On the Origin of Species, the only figure is a complex branching diagram (or ‘tree’) that describes how species can emerge through evolutionary processes. Since then, bifurcating* tree diagrams have been extensively used to understand the evolutionary history of different biological entities.

The use of a tree structure to describe inter-species relationships implies that any two individuals and, consequently, all living things ever existed, share a common ancestor. Phylogenetics is the field of study that attempts to infer these relationships, taking advantage of any measurable, heritable feature that can be used to link two (or more) different species. A ‘phylogenetic tree’ describes how these entities evolved, by specifying nested common-ancestry relationships.

To build a phylogenetic tree, one must look at features that can accurately tell the evolutionary history of how (and maybe also when) species emerged. These features must be homologous, meaning that they originated from a common ancestral characteristic and can potentially reflect the true evolutionary history of the species. Some features may be shared by a set of species but not originate from a common ancestor, a phenomenon known as homoplasy. The wings of birds and insects are a characteristic example of a shared attribute that has emerged independently and is homoplastic.

Figure 1. Just as species change through time, so does studying evolution. It’s been a long journey from Darwin’s first, hand-drawn phylogenetic tree (bottom, far left) to the current phylogenetic tree of life that includes all living beings (bottom, far right). New sources of information and methodological improvements will continue to shape our view on the history of life.

The field of phylogenetics has also been evolving since its birth (Figure 1). In the beginning, careful extraction and comparison of morphological features among organisms was the only information utilised to infer their phylogenetic relationships. Distances between species were defined based on the number of similar shared features, and species with smaller distances were considered closely related. This method came with drawbacks, since morphological characters are difficult to collect and determining their homology is not always trivial.

As years passed, a new source of information became available to phylogeneticists. When we started ‘reading’ DNA sequences in the mid-20th century, some people thought that this new type of data, the molecular data, could provide new insights in tracing evolutionary history. New, more sophisticated ways of going from data to trees were developed from statisticians, but defining homology between DNA sequences was a problem needing to be solved. DNA is made of nucleotides and nucleotides of one species can have homologous counterparts in others. Sequence alignment (Figure 2) is the method to detect homology between nucleotides, and one set of homologous nucleotides is the informative unit to infer a phylogenetic tree from a sequence alignment. Intuitively, the more homologous nucleotides we have at hand, the better the inference will be.

Figure 2. Above: Amino acid sequences can be used as an alternative to nucleotides. This is a snapshot of the alignment of histone H1 amino acid sequences from five mammal animals. Each column (position) represents a set of homologous amino acids among these species. Positions in dark grey are not informative because they are invariant. Below: The phylogenetic tree that best explains the relationships among these 5 animals.

This is the reason why, when next-generation sequencing** started providing massive amounts of molecular data in the 2000s, researchers of phylogenetics sought to include as long alignments as possible in their inferences. In the first years of molecular phylogenetics, nucleotides of one or few genes were combined (or concatenated) and subsequently used to reconstruct the evolutionary history of the underlying species. However, different sets of genes produced incongruent phylogenetic trees, and selecting the most reliable was not straightforward. The bloom of genome-scale information urged people into a hunt for more data, more complex statistics and, necessarily, more efficient (computational) tools to perform the increasingly intensive analyses. The era of phylogenomics had arrived, together with hopes that the incongruence will end.

Even though phylogenetics research is evolving at a fast rate, two things remain constant throughout its course: first, it is crucial that the characters used to reconstruct phylogenies are echoing the true evolutionary history of the species of interest; second, of equal importance is that the methods employed are able to recover this history as accurately as possible, given the data at hand.

Modern-day phylogenomics is a fascinating field, as it has always been. It is an inherently multidisciplinary field that requires advanced biological, statistical and computational knowledge to be combined and work together to produce meaningful and realistic results. Scientists who practice phylogenetics these days must be sufficiently trained in all three fields and specialised in at least one. Understanding the flaws of the current methods and the potential of newly proposed ones is the only way to fill the gaps towards reconstructing a reliable, and why not true, Tree of Life.

— — — — — — — — — — — — — — — — — — — — — — — — — —
*a bifurcating (or binary) tree is defined as a branching graph where in every splitting point (node) a branch (edge) will be split into exactly two children branches. In biological context, this means that exactly two species will emerge from one common ancestor, an event known as speciation.
**technological advances in DNA sequencing allowed for much faster and cheaper acquisition of new data than before, a method that became known as next-generation sequencing, or NGS.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade