Your Zenome:

Jeff Heizmann

Published in

Zenome

8 min readApr 13, 2018

The first genetic test with constant feedback on blockchain technology from the Zenome team

Now available to pre-order for 1000 ZNA-tokens

Abstract

The Zenome team is pleased to announce that you can now get your personal genetic report on our platform. The test results will be available directly in the user’s personal account — wallet.zenome.io.

For 1000 ZNA tokens, we will turn your sample of saliva into the most complete and detailed genetic report to date.
In this article, we will disclose the technical characteristics of our product and tell you what benefits and values you’ll receive after getting your test.
It is not just a genetic report, but access to the entire infrastructure of Zenome, which includes the storage and exchange of personal and medical information. Also, there will be a constant feedback system to improve the quality of the analysis and to gather additional information for permanent improvements. Thus, our new product will include a genetic test, personalized services from partners, as well as the ability to earn money from your genetic and personal data.

Note: To describe our product as clearly and informatively as possible, we made a genetic analysis of our data-scientist Vladimir Naumov, and in the following publications we will tell you about it. In the meantime, we will disclose some technical details and features of our genetic test

Specifications: Exome +

● Gene-coding sequences
● UTRs (untranslated regions)
● Non-coding-regions with known value

Improved Exome Analysis: To obtain data, we use the high-throughput sequencing method. We have developed a kit for target enrichment: we have chosen genomic regions that include all known genes (the coding part), as well as additional SNPs of interest from the intergenic regions and UTRs of genes.

Note: To digitalize your DNA, we need your genetic material. To get the genomic report from Zenome, you need to collect only 2 ml of saliva into a special collection tube. It is non-invasive, safe, and absolutely painless!

What sections are included in the personal genetic report? Why do we need each section?

The report is divided into parts, based on the functional categories and way of genetic risk calculating:

Inherited diseases carrier status
In this part of the report, you will learn if you are the carrier of any hereditary (monogenic) diseases. This can be useful when planning for a child and screening for chronic diseases.

Health Conditions
Here you will learn the risks of developing common diseases: metabolic, cardiovascular, oncological and neurodegenerative — most socially significant.

Pharmacogenomics
In this section, you’ll learn everything related to the interaction of your body to medicines (drug response). You will discover the metabolic abilities of your body, determine the optimal dose and the degree of side effects for different medications.

Diet and Metabolism
This section will help you choose the optimal diet to reduce or gain weight, as well as to minimize the deficit for some vitamins and nutrients.
To improve your metabolic status, a recommendation is possible to exclude certain foods from your regular diet.

Cosmetology
Learn which cosmetic products help with an optimal appearance that are specifically tailored to your hair and skin, hidden in genomic features.

Sports
Select the ideal activities for muscle development, health maintenance, weight reduction.

Ancestry
Everyone is interested in learning about their ancestors. You can clarify who they were, and where on Earth do others live who are genetically close to you.

Bioinformatics: algorithms and risks calculations

Depending on the section, different calculation methods are used

A model describing monogenic inheritance
Monogenic diseases are usually associated with serious malfunction of some proteins and are inherited by the Mendelian principle: one gene — one disease. The human genome is diploid: each autosome (non-sex chromosomes) have a pair, so each gene has at least two copies. Monogenic diseases by type of inheritance are divided into:
● dominant — damage of one copy of the gene is enough for the development of the disease
● recessive — both copies of the gene must be damaged for the disease, but it is possible to carry the mutation without harm to the carrier
Penetrance is the probability of developing the disease and having the pathogenic mutation in the genome. For monogenic diseases, penetrance takes sufficiently high values (more than 70%).
The main source of information about monogenic diseases are in the databases LOVD, HGMD, CLINVAR.
The report provides information on the carrier status of more than 50 hereditary monogenic diseases, such as cystic fibrosis, thalassemia, galactosemia, and others.
For monogenic diseases generally (but not always), the risk of disease is determined by the mutation with the greatest risk (in fact, by the mutation that causes the most damage).
R = max (Ri),
where i is each of the mutations,
Ri is the maximum risk for a detected mutation in the tested patient.

Polygenic inheritance model
In the case where several genes are responsible for the manifestation of the trait, a polygenic model is used for prediction. It is based on the following data:
● full-genomic association studies (GWAS catalog)
● population frequency of polymorphisms associated with the trait
● knowledge of the biochemistry of the process (metabolic pathways)
Each model for risk assessment of polygenic disease is developed by the analysis of genetic databases and scientific publications. The set of mutations/polymorphisms is collected into the Zenome internal database and used for risk estimation. The database is continually updating.

Here is an example of the risk calculation model for bronchial asthma

The following variables are included:

- the individual has a risk allele based on genotyping data
- the zygosity of the risk allele: heterozygous or homozygous (hom\het)
- the odds ratios for a given disease, associated with the presence of a risk allele (r)
- the frequency of the risk allele in the population to which the individual belongs (p)
- the incidence of the disease according to statistics

q = 1-p is non-risk allele frequency
the average risk for the population is calculated as:
R = p*p*r*r+2*p*q*r+q*q
Relative risk for cases of homo- and heterozygous state
RRaa = r*r/R
RRab = r/R
RRab = 1/R

The relative risks of several SNPs are multiplied, which gives us Rmg — multiplied risk.
Then the resulting Rmg is multiplied by the statistical risk of disease in this population (based on epidemiological data).

Examples of risk calculations

Assume the person has following genotype: rs7216389 TT, and rs1420101 AT
begin with the polymorphism rs7216389.
1. Average risk in the population:

TT*OR*OR+AT*OR+AA*1 = 0.19*1.45*1.45+0.56*1.45+0.25 = 1.46

2. With CT genotypes, the relative risk is 1.45 * 1.45 / 1.46 = 1.44
For the second polymorphism rs1420101:
average risk = 6.4 * 6.4 * 0.13 + 6.4 * 0.49 + 0.38 = 8.84
relative risk with genotype AT = 6.4 / 8.84 = 0.72

Taking into account all SNPs: 1.44 * 0.72 = 1.03, the risk for a person with this genotype.
We would then multiply this number by the statistical risk of developing the disease in this population to receive a lifetime risk.

The model for determination of ancestry
There are many studies of different populations of genetic characteristics. It is possible to determine ancestry and population origin for any person knowing the individual genome and the frequency of genetic variants in different populations. There are three options:
● maternal DNA (from mitochondria — inherited by the female line)
● paternal DNA (from the Y-chromosome — inherited by the male line)
● AIMS (ancestry informative SNPs) frequencies in autosomes (non-sex chromosomes), but highly different in populations

The first two methods are based on specific features of the Y and M chromosomes; they have segments of a very low mutation rate. The list of such intervals is known by and is called haplogroups. According to information on the human haplogroup, it is possible to trace the movement of one’s ancestors on the planet. We can compare modern samples of DNA with DNA samples found in ancient burial grounds around the world.

To analyze the relation of a person to a particular population, we can analyze autosomal polymorphisms to perform a variant of the principal component analysis (PCA) method, which allows the individual to be represented as a point in a multidimensional space. The position in this space is determined by the individual’s genotypes. Simultaneously, this analysis is performed for people of known populations. Measuring the distance from this point determines the position of the individual to the centers of known populations clusters, and we can calculate the person’s ancestry.

The figure is taken from https://www.ncbi.nlm.nih.gov/pubmed/27453128

Blockchain added value: feedback from the user and phenotypic records

Zenome for genomic data — it’s like the Internet for individual computers
Gavin Belson

Now — here’s the part for blockchain enthusiasts and people who want to make money from their genomic information.
Why is it a new generation report? You do not just get a genetic report. You get regular updates and a lot of personalized services based on your genetic data. Blockchain allows you to use your genetic data extensively to obtain services and at the same time ensures privacy. Constant feedback between users helps to build a very well-structured database for further machine learning application and big data analysis.

The genome gives us much information, but it’s just a drop in the ocean.
A structured cross-linked database of genetic and phenotypic information will allow a phase transition from standard mathematical models for risk assessment based on a simplified understanding of the biology of processes and simple statistical analysis to the application of neural networks and artificial intelligence as a more precise method for analyzing large genomic data becomes available.
The primary problem limiting the transition to the era of genomic neural networks is the lack of properly structured genomic data with the possibility of feedback to the owner of the genome in real-time. The use of the block-concept developed by the Zenome developers will make it possible to create such a structured database.