Degenics: A VPN for your DNA
The Privacy-First Platform for Personal Genetic Testing
This file provides the initial concept document for the Decentralized Genetics project.
It is a copy of the original README.md file which was initially committed into the Degenics Github repository back in December 27th, 2020.
Caveat: The Following Concept Description is licensed via Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), to Pandu Sastrowardoyo / Decentricity
Elevator Pitch
Degenics: The Decentralized Genetics Initiative
We are building a decentralized platform for your personal genetics. Our concept allows synergy between labs of all scales while guaranteeing user anonymity and sovereignty at every step of the genomic data science workflow — from sample collection, data storage, to report generation.
Vision
We aim to democratize direct-to-consumer genomics with a privacy-preserving, anonymous-first platform running on a fully decentralized, autonomous infrastructure.
MVP Vision
Degenics 1.0 (private deployment March 2021) will provide anonymous, decentralised sample collection, payment, data storage, and report distribution. TL;DR: a VPN for your DNA.
Who we are
Degenics is a group of IT consultants, blockchain developers, and biologists who are passionate about privacy, decentralized technologies, and genomic data science.
- Pandu Sastrowardoyo — Initiator
- Gilang Bhagaskara — Tech Lead
- Jean-Daniel Gauthier — Product Lead
- Aaron Ting — Marketing Lead
- Bobby Andika — Dev Lead
- Muhammad Arif — Back End and Blockchain
- Aloysius Dedy — Blockchain and Smart Contracts
- Kevin Janada — UI/UX
- Ibnu Gamal Alhadid — Advisor
Trustless/Fully Decentralized Model
Anonymous Physical-Digital Bridge
Without having to enter KYC, the user activates the dApp and generates her passphrase. This forms the basis of her private key, and is paired with a sharable public key.
The user takes one or two samples, depending on the tier she selects. The two-sample tier is fully trustless, and uses the results from one lab to check the result of the other lab (see next section). The one-sample tier will integrate the Kilt.io protocol to enable the selected lab to attach credentials to their test result.
Sampling does not require Degenics to provide a test kit to the user ahead of time. The user can construct the sampling kit themselves by simply buying a sample bottle and a pack of cotton buds — available at pharmacies and online stores worldwide.
The user samples, executing a cheek swab (buccal swab) by swabbing the inside of her cheeks 10 times each, and puts the resulting cotton bud into a sample bottle. With the two sample-tier, she repeats the process for the second sample bottle. Each sample bottle is put into its own envelope.
dApp creates an envelope label that the user can either rewrite or print on the sample envelopes. This envelope label contains her public key, no personally identifiable information and no return address, but does contain 2 lab addresses to send samples to.
User sends the two envelopes via her local post office box.
Decentralized Labs, Sovereign Data
The aforementioned labs analyze the sample and produce datasets (VCF file & analysis).
The platform compares datasets from the two labs and checks the difference between select points in the data.
If there is more than 10% difference, the user is given the option to send in another sample. Sampling instructions are provided again.
If there is less than 10% difference between two output datasets. both are encrypted with the user’s public key and put inside the platform (Blockchain + Decentralized Storage).
User is alerted, and can access her data at any time by decrypting via private key.
Enterprise/Consortium Model
Semi-Decentralized Deployment with “Lab Marketplace”
An alternative deployment model would be the private / consortium deployment where labs are directly selected by the users.
This simplifies the workflow while maintaining the anonymity of the users. The labs get access to a commercial market, and the users still get an anonymous physical-to-digital bridge for their genomic data.
Additionally, after the initial on-ramp, the labs are free to up-sell additional analytics products to the users, without resampling.
This is made workable with our statement of direction to use the Kilt protocol, which allows labs to provide credentials which apply to the documents (genome VCF and reports) that they send out.
Degenics Prototype Demo Video
(Alpha 02–2021/01/18)
FAQ
If things are happening anonymously, how does the user/customer pay for the service?
There are two options, the traditional option or the fully decentralized option. We prefer fully decentralized, but this might not be an option in all locales. Traditional payment models may also work better for consortium or private deployments of Degenics.
Enterprise/Consortium Model
The traditional option: Consumer funds are held in escrow by a local payment gateway or bank until the lab provides valid data (report and genome) into decentralized storage. The smart contract then triggers fund disbursement into the lab’s accounts. Note that this still maintains anonymity of the genomic data, since:
- Payment gateway / bank does have access to consumer KYC, but does not have access to genomic data or reports
- Labs don’t have access to consumer KYC, although it does have access to anonymized genomic data.
Trustless/Decentralized Payments
All transactions happen via a Blockchain token model. Consumer onboards with their preferred cryptocurrency token, or goes through a fiat-to-crypto bridge (example here) to pay. Smart contracts hold consumer’s tokens in escrow until labs provide valid data. The smart contract then triggers fund disbursement into the labs’ account
Why 2 labs? Is it for the sake of comparing the result?
We designed Degenics with 2 labs per transaction since we want to solve this following global problem with personal genetic testing:
Unlike other categories of services, consumers can’t recheck the results of DNA analytics services unless they have access to a lab or PCR device of their own.
This, we feel, places consumers at a disadvantage and may lower the quality of future genetic testing results within the ecosystem.
Thus, we designed the platform with a semi-random, rating-based selection of two labs per transaction to ensure two things:
- All labs, from garage DIYBiolabs to large companies, can join the ecosystem and compete with each other.
- The quality of the results are maintained, even with the consumers being anonymous.
The two labs model is optional (users can elect to just choose one lab to send to), but Degenics believes that the future of a truly decentralized genetic testing ecosystem lies here.
In the long term, this system also allows the ecosystem to grow further — with smaller labs helping to check the larger labs, and vice versa. The rating system associated with the labs would also provide incentives for the ecosystem to increase in quality.
Personal genome files are huge. How would you expect to send these files via Blockchain?
You’re right — genome files are quite large. Raw PCR output can be up to 900GB, and even VCF files hover around 10–100MB (for a subset of sequences) or 1GB (for Whole Genome Sequencing).
Here’s our strategy in enabling these files to be shared and owned by the user:
- First, we will focus on the VCF and WGS files exclusively. In our initial POC, only the smaller VCF files are included in the “result package” sent to each user, along with the report. In the live solution, WGS files will be included.
- Second, we will utilize an IPFS connection from Polkadot to act as the decentralized storage mechanism. The “result package” is encrypted off-chain (with the user’s public key) and put within this decentralized storage platform.
- Third, the main blockchain platform itself will link to the decentralized storage mechanism through a hash list. This means that the main blockchain platform only contains pointers to the data in decentralized storage, and not the actual genomic files.
In the enterprise/consortium strategy, the IPFS/torrent platform can be replaced with regular public or private cloud solutions.
Why Polkadot?
The current problem facing the Blockchain universe right now is the problem of scalability. Current Ethereum-based solutions are built to be general-purpose, one-size-fits-all solution — which makes adoption quick, but scalability rather challenging for boutique solutions like Decentralized Genetics.
Polkadot provides transactional scalability by spreading transactions across multiple parallel blockchains which built using substrate framework. Polkadot has 1000 validators in the relay chain and these are split up into a small number that validate each parachain (minimum of 14). It aims to ensure the substrate blockchains are connecting to Polkadot’s chain of parachains with high performance. Parachains in Polkadot are also capable of processing up to around 1500 transactions per second.
Comparing Polkadot Blockchain performance with Visa Network:
- Visa Network: 24,000 Tx/s
- Polkadot: 1,500Tx/s * ’n’ parachains.
*Note: ’n’ is the number of parachains. Which means the larger the polkadot ecosystem is, the higher the performance numbers.
The Polkadot team published the following theoretical speed limit and comparison:
166,666 Tx/s (!)
The comparison graph can be seen 45 minutes in:
We believe that the scalability of this platform matches and meshes quite well with our concept of Decentralized Genetics. We require this kind of scalability because of the nature of genetic data (large datasets, massive analysis) and the kind of platform we want to create (fully decentralized, owned by the community).