Cataloging a vast gene ‘library’

The Long Room, Trinity College Dublin

Dr Barbara Kramarz from the BHF-funded Gene Ontology (GO) annotation team, led by Dr Ruth Lovering, based at the UCL Centre for Cardiovascular Genetics, explains what the job of a GO biocurator involves. Here she talks about what GO annotations are, and how they help scientists and clinicians to design their studies and interpret results with greater accuracy and precision.

In their work, scientists constantly rely on databases filled with biological information (such as DNA sequences or population data) and the tools that they use to make sense of all of this data. These resources are used both when designing experiments and afterwards, when they use various tools and databases to analyse the results of their experimental studies.

But how does the biological data get into the databases? And how do we make sure that the data is usable and not just a jumbled mess?

The continuously growing amounts of biological data need to be not only accurately tagged and organised, but also easily accessible in public online databases in order to be useful to scientists and to enhance the progress of scientific research.

What are Gene Ontology Annotations?

Gene Ontology (GO) annotation aims to organise the ever-expanding amounts of biological data available to scientists using standardised criteria, and to make this data easily accessible to researchers as well as computers.

It is the job of biocurators to extract information from published scientific literature and to annotate them using universal, but consistent, criteria. Think of a librarian categorising and cataloguing books in a vast library. Just as these book categories enable users to search for the book they are interested in, the GO tags, or annotations, allow scientists to more easily identify their genes of interest. These GO tags subsequently form an invaluable resource for analysis of large experimental datasets, for instance, lists of genes associated with heart disease risk.

GO serves to provide summaries describing the roles of genes and their products, which are mostly proteins. Other sets of biological terms that describe the types of cells and tissues, in which the proteins being annotated carry out their functions, are used to further supplement the GO annotations and create more informative gene records. The tagged biological information is then accessible in a number of online scientific databases as well as Wikipedia.

The UCL Gene Ontology Biocuration Team 2016. From the left: Barbara Kramarz, Rebecca Foulger, Rachael Huntley, Paul Denny and Ruth Lovering (also Nancy Campbell, not pictured, currently on maternity leave).

There are two Gene Ontology annotation teams working at UCL: the BHF-funded team of biocurators, focusing on annotation of proteins and miRNAs involved in cardiovascular physiology, and the neurological biology team, currently focusing on Parkinson’s disease and projects about the junctions between the nerves in the brain, called synapses.

If researchers publish an article about the role of a certain protein, let’s call it ‘Protein A’, and its role is in heart enlargement, or hypertrophy, biocurators can annotate ‘Protein A’ using GO tags describing this role. For example, if the article demonstrated that ‘Protein A’ binds to DNA inside the nucleus, through which it regulates the expression of other genes, which in turn trigger an increase the thickness of heart walls, all of this information about ‘Protein A’ can be catalogued using the GO annotations.

There will likely be more proteins with similar roles and, therefore, the universal GO tags can be used by researchers and clinicians attempting to design therapies for heart hypertrophy to identify the proteins and specific cellular mechanisms that could be targeted in treatment of this heart condition. Hence, comprehensive GO annotations of genes involved in heart physiology are a key prerequisite for streamlining the identification of drug targets and subsequent development of treatments.

Why is GO annotation important?

GO annotations are most often used to analyse genes that are affected in different diseases, including diseases of the heart.

Say you needed to use a library to research a specific subject area, e.g. the history of heart transplantation. If the books in your library were not catalogued and organised you’d have to pull each book off of the shelves in turn to decide whether it was relevant to your research. You could be in the library some time.

If, however, the books were clearly categorised, you could easily select books labelled with ‘organ transplantation’, ‘heart surgery’, or ‘history of medicine’ to find the books that you needed. Similarly, GO annotations allow researchers to more efficiently identify the genes, which should be investigated in further studies in context of specific health conditions.

Overall, the GO annotation initiative leads to creating more comprehensive knowledge bases, or libraries of highly specialised scientific and medical information, which in turn allow for enhanced analyses of large datasets and for better-informed design of further laboratory and clinical studies. Finally, all of this work this will lead to improvements in disease counselling, diagnosis and prognosis as well as the development of disease therapies.