How to Analyse Your Own DNA: A Point of View of Ordinary Customer, Part I

German Demidov
13 min readDec 26, 2019

--

Hello again,

today I will start an overview of the experience I had with Dante Labs 30x Whole Genome sequencing analysis kit. As I said before, I am a bioinformatician, but today I am prohibited to use my magic outside of Hogwarts, so I will just describe how ordinary customers see it without using my computational skills and resources. If you are new to this tutorial, you may want to start from the first part.

M.D. Leonid Rogozov performing an appendectomy on himself during the sixth Soviet Antarctic Expedition.

Why is it important? Obviously, thousands, millions of people got their medical genetic testing worldwide and a lot of them managed to provide their DNA samples and understand the results somehow. Why do I think that your time worth to be spent on reading this?

Well, in my opinion, previously, when the proper genomic tests were available to reach people only, they kind of had an opportunity to discuss the accidental findings with genetic experts right away. While now, when the sequencing prices dropped (pssss, not sequencing, but the reading of DNA — I promised to be an ordinary customer today) — a person who ordered a genetic test stays more or less alone against his genomic variants. Can it be scary? Oh, yes, it can. And in most of the cases, there are no reasons to worry, nevertheless, such results may heavily affect a person’s life and lead to unnecessary life choices. Here I’ll try to explain my logic when I check some entry in my genetic report and why you should not panic in any case before the real doctor (not Google-M.D.) checks your results. A reminder: PhD is not a real doctor.

Before making this chapter I had some concerns about the privacy of my data. Well, what happens if I reveal that I have a mutation in gene X? Will my friends stop talking to me? Will my insurance cost 10x more? Maybe. However, there are many people who live with rare disorders starting from the beginning of their lives. There are numerous people who suffer from embryonal developmental disorders. There are many people who experience cancer during their childhood. There are brave women who have to perform mastectomy as a result of pathogenic and highly risky BRCA1/2 mutations revealed with genomic analysis. Am I less brave then them to hide from my heritage? Can I call myself a scientist if I analyse many genomes of others, but hide my own genome since it may hurt? Do I agree that people with certain genomic variants should be stigmatized?

Naaaah. So, as a support to the above mentioned and not mentioned groups, I will discuss the real variants I have. Nevertheless, I do understand and do not actually judge other researchers who prefer to hide their genomic variants.

Ordering of the kit and sending the biomaterial to the Dante Labs facility

Again, I am not affiliated with Dante Labs, but I made a genetic test there, so I will describe the process specifically regarding this particular company.

It was Black Friday, and the Whole Genome sequencing price dropped from around 600 to 170 euros. How is it possible that the company provides such prices and remains profitable? It is not — highly likely that this is a promo price and in theory, most probably a company actually loses money on such analysis. That’s not our business, seriously.

It took me 2 hours to decide if I want to do it right now. The reviews of Dante Labs were pretty negative on many of the web sites, however, most of the complaints were about their long delivery time. I was thinking — well, is it OK for me to wait for one year for my results for such price? The answer was yes. I ordered a kit, putting up my phone number and email for the notifications.

First of all, I barely got a couple of notifications to my email (not a single one to my phone) during the whole process, while in theory there should be tens of them. Second…I received the kit around 2 days after the order. What’s inside? A tube with several additional containers and caps, a short booklet describing how should I collect my saliva and a reference number (14 digits). This number you have to put in the form on their website, this is how your kit is registered and they know where they need to send the results. To fill the tube with your saliva you should not eat 2 hours before the sample collection. You fill it, you attach a special container with some chemicals that fix your DNA (some force has to be applied to break the walls of the container and the blue gel drops into your saliva), you shake it gently for a while. Then you close it with the normal dead-end cap (included) and put it back into the box they provided. Then I went to Deutsche Post office, put the pre-paid shipping label on my box (attention: they do not perfectly fit each other, box and the label) and just gave to a worker there. You are done. Here is the video of the sample collection.

Then the status of my kit online changed to “We received your kit”, then to “Awaiting QC inspection”…and it got stuck for around one week. I opened the website one day for an unknown reason and whoa, all the results were there. The notification was received a couple of days after.

The turnaround time was 10 days since I ordered the kit.

Dante Labs providing reports: An Overview

By default, you get 3 reports (Nutrigenomics, Pharmacogenetics, Wellness and lifestyle which is not actually wellness and lifestyle, but health conditions report). I purchased 2 additional ones: one for hereditary cancer (for the reasons described in the first part of this tutorial), one for connective tissue since me and my siblings have a hyperplasticity of the joints which is not a well-defined pathology, but more like a variant of norm (it does not affect everyday life a lot). Each additional report cost 20 euros.

My reports page after I purchased 2 additional reports looks like this

Additionally, they provide raw files. The first thing that I was asked when I described these files to my friends who do not do biomedical research was “why do we need these files, we understand nothing there”. Wait for the next parts of this tutorial — I’ll explain why it is an extremely beneficial thing, but not an additional burden of your time. Of course, I am lucky and I have access to a super powerful medical computational cluster with all the required tools installed — however, I’ll make the tutorial about self-analysis as simple as possible.

RAW files to download — don’t look there, we will discuss them later

So…what’s inside? What do these reports tell you?

Wait a bit (or, if you already know how genomic variants work, skip the next several paragraphs), we are still not ready. At first, let’s divide all mutations that potentially affect your phenotype (maybe even causing a disease), using several classification systems.

Mutations may be classified in a way they affect the protein that is formed from the gene. We have a DNA which is formed with 4 letters (adenine, cytosine, thymine, guanine — ACGT). 3 letters of DNA in a row can code an aminoacid. We have 20 aminoacids which form our proteins.

Decyphering our protein language, using DNA language. The name of aminoacid (such as phenylalanine or leucine) is written for each 3-letter combination. Some aminoacids may be coded with several combinations. Picture from this web site. (That’s a check if you read it carefully or not — why do you think there is a different letter in our DNA alphabet here, U instead of T?)

And what is important for our organisms — proteins, they do the work inside our cells, not the DNA, DNA only encodes proteins. Say, you have a computer game, it is saved as 0s and 1s on your hard drive — do you actually play looking at the binary code? No, you run your game, 0s and 1s become 3D models and the DNA works in a similar way — it just provides instructions to your proteins’ assembly.

DNA -> RNA (not shown) -> protein (from this web site)

And proteins are complex. We can say that each protein is somewhat similar to a piece of Ikea furniture. If one of the details of our sofa is shorter than needed — our sofa will not work properly, it will be shaky, it will be uncomfortable, maybe it won’t be possible to use it at all. So, like in Ikea furniture, all the details have to match each other perfectly for our proteins. And if there is a wrong letter in your DNA — your sofa’s legs may be of different size at the end. A serious error in DNA —serious error in your protein, no way to go.

Haemoglobin protein complex. Everything should perfectly fit!

We also have the start of the gene symbol and end of the gene. If the variant in DNA changes the corresponding amino acid, it is a missense mutation. Some amino acids are similar to each other and such mutation will still allow us to use this “sofa”, but some — not. If the variant causes the pre-mature stop codon (so only part of the protein will be read from the DNA) — it is a non-sense mutation. So, instead of the whole sofa detail, you get only a piece of it — you can not assemble a sofa using this stub. If one of the letters of DNA is removed, it can cause a frameshift mutation, so you start decyphering your proteins from the wrong position. And instead of the correct protein, we will get abracadabra after such mutation instead of our sofa’s legs which have to fit other parts of the sofa.

Mutations in a clinical sense can be pathogenic, likely pathogenic, unknown significance and likely benign. We consider only the first two categories. Everything else is a material for the news agency “One Grandma Said” (OGS), which ironically denotes information which we can not trust. How do we understand if the mutation is pathogenic? That’s tough. Ideally, we introduce this mutation in a model organism and see what happens. However, this is a long way and the shorter ones are: by frequency in the population (higher frequency = less likely to be pathogenic), effect on protein structure, strength of association with disease with some causal inference, etc. — we don’t discuss it here, we blindly trust the existing databases.

Mutations can be actionable (you can do something, e.g., change your lifestyle and affect your risk) or non-actionable. E.g., a person may have a mutation that will definitely cause the disease such as Alzheimer’s (let’s assume that such mutations exist) — do we actually want to tell this person that (s)he will have it? There is a huge ethical question. As a medical researcher, I realise that my views on ethics are really shifted due to constant exposure to many dreadful diagnoses that some [anonymized — bioinformaticians don’t know the names, only doctors do] people may have, I can not make a point here.

Mutation can be 1) presented from only 1 of copies of your DNA (heterozygous mutation) or 2) presented in both copies (homozygous). How come the 2nd case may occur, the genome is so long, how the same mutation may occur twice at the same position? We inherit most of our mutations so if both of your parents have a particular one in one copy for each of them — a kid has 2/4 chance to have it either as heterozygous and ¼ chance to have it as homozygous. Mutations can happen de novo, however, normally human has less than 100 de novo mutations.

Pistils and pollens (a reminder from the middle school)

Having these 3 categories (or maybe more? write in comments), we are ready to analyse our genomic reports. This process is somewhat similar to playing the “battleships” game.

The field for the Battleships game. The goal of the player is to guess where the enemy ships are located by checking different coordinates of this field.

Your report (entries which consist of genes with some variants there) is this game map, and entries are ships. In order to understand the status of each entry from your report, you need to check several important points for each of the genes. You check one point (e.g., trying to hit A1 on this picture) — it is a hit. You check another point — A2, and it is a miss. You discard this variant as non-important and switch to the next entry in your report. Only if you hit all the cells where the ship is located (your reported variant meets all the criteria) — then it may be useful.

What are these points? At first, quality of the variant — Dante Labs uses quite a good computational method to discover them (DRAGEN pipeline), but it always worth to check it “by hand” and in the next part I’ll explain you a bit how. Is the variant truly associated with something, did science provide you with enough evidence? Next level — is the variant truly pathogenic? If it is — what is the type of inheritance here, do you need 2 copies of the variant to have a disease (recessive variant) or only 1 copy is enough (dominant)? Next — is the disease complex (many variants contribute to the disease, each of them just increasing the risk of a person to have such disease — schizophrenia is an example) or “simple”, Mendelian (one affected gene is enough, like in Cystic Fibrosis). What is the penetrance if your variant is Mendelian (how big are chances that you will develop this disease having this variant)? What are the odds to develop the disease compared to individuals without this variant (for complex diseases)? Can you affect the disease by lifestyle changes? And other points.

The introduction, again, took too much. To the reports! We will go from the least interesting ones to the most interesting.

Connective Tissue Disorders report

Useless in my case. I have a clear phenotype and even 2 shoulder surgeries due to hyperplasticity, my siblings have a clear phenotype, still no associations were found.

The header of my report.

Why? At first, the quality of sequencing in these genes could be low — we may check it later. And maybe the genetic reason for my particular type of connective tissue defect is still not discovered — there are 6 variants of unknown significance (VUS) in these genes reported by Dante Labs, maybe some of them are actually pathogenic, but not well studied? The biggest progress that was done in the area of such genetic reports was about the number of VUS — it decreased rapidly, but well, this is still the beginning of genetic science journey for the humankind. Nothing to cry about, 20 euros, I spent 10 times more in Las Vegas once and it was a lot for a PhD student (I am joking, never been to Las Vegas). There were also 3 variants with “conflicting interpretations of pathogenicity”. It is good to see them there, it means the report was made in order to presents all the points of view, not only “there is an effect” ones. But still, as you remember, we pay attention only to the variants that are likely pathogenic.

Nutrigenomic report

The next report which is almost uninteresting is how my genes affect my ability to absorb different substances that get into my organism.

The header of the nutrigenomic report

How these reports are done? During big GWAS studies, when we select a huge cohort of people, ask them if they like chilli sauce and then try to find if we have any genetic markers associated with the chilli sauce taste. Of course, there are many confounding factors which we need to take into account, e.g., Mexicans like chilli sauce more than other nations in general so we may mistake genomic variants of Mexicans for chilli taste variants (was it racist? I hope you all understand that it was a joke and I had no intention to offend Mexicans, I have several friends from there and they are amazing people). Not only being a Mexican but also smoking may increase your tolerance to spicy food, etc., etc. — and if you forgot to put it into your model for variants discovery, you find totally different and unrelated variants. So, in general, these studies are something that you show to your friends, but not something you seriously take into account. Yes, there are some good markers, such as lactose intolerance variants, but the provided report is not fully about well-studied markers (but yes, I was not lactose intolerant — very useful to know for a person who cures his bad mood with kilos of cheese).

My main conclusions from these report were: I may have increasing snacking behaviour (true, true), eating disinhibition (also true), enhanced caffeine metabolism (up to 6 cups per day, Madre mia), not likely to experience alcohol flush (oh, thanks to my ancestry). I am not going to provide scientific links to all of these traits, but for one, more or less randomly chosen — snacking behaviour. Variant with id rs2025804 (do not be surprised, each variant that people have already seen has its own ID) was chosen as a marker and here are the results from PubMed (the largest library of scientific papers) that mention this variant. What I want to say…I am not Pima Indian, with all due respect to this ethnic group. Again, “One Grandma Said that it may be connected…”.

More interesting reports will be investigated in the next parts — stay tuned =)

--

--