Your Most Sensitive Data

Published in

DataSeries

6 min readDec 19, 2019

Nature writes code. It is a messy, long-term process, but the evolutionary algorithms that created the human genome, and millions of similar genomes in millions of very different species, is the work of billions of years, and utterly unintentional. Yet, the 3 billion base pairs of DNA that comprise you also define you, in a sense, in ways that no other data does. Moreover, we still know almost nothing about how this works, and what that code says, despite decades of mapping the genome and ongoing research into the nature of genes, development, and disease. Which is why, for the past ten years, massive experiments and data gathering have been underway to try to fill in the blanks, to get enough data to churn in ever-advancing big data projects and research, to understand how DNA makes you, you. Some of the most far reaching research projects are undertaken under the guise of commerce, with test subjects being at best dimly aware that they are in fact subjects — not merely customers.

23andMe and Ancestry have cornered the market in direct-to-consumer genetic tests. Nearly 20 million people have now purchased insights into their heritage and other superficial knowledge about their general propensities and health. 23andMe (which Google practically cofounded), and Ancestry, are private corporations. There is limited transparency about where they are making their money. A forensic analysis of the few deals that have been made public, coupled with information they have acknowledged and publicized, shows that together they have made nearly 500,000,000 apart from test kit sale. They sell genetic data they gather from ‘customers.’ Customers who effectively become test subjects. It appears that many of the customers of these companies aren’t really aware of the value of their genes, nor even, that that value is routinely being extracted to the profit of testing companies and big pharma. Because there is no legal protection for one’s genetic code, the marketplace for DNA is still a sort of gold rush. Companies are scrambling to stake claims and derive profits, with little regard for those whose data it really is: ours.

Genetic data is highly valuable. Even before it was, tissues were a source of significant research data as well as wealth. One prominent example is that of Henrietta Lacks, whose immortal cancerous cell line helped thousands of studies and was licensed without her knowledge or consent. These became the source of millions in revenue for various companies and researchers whose innovations were based on the study of her cells. The wealth generated was never shared with her. She died in relative poverty. Recently did the NIH did finally agree to pay her family an undisclosed settlement.

While Henrietta Lacks became famous posthumously, and her family recompensed to some degree, millions of others may well wish to allow their tissues and data to be used for research but remain anonymous, preserving their privacy, and helping science without any need for compensation. Still others may wish to be paid something for the use of data that generates not just wisdom, but wealth.

For basic science to succeed in driving innovation and improving our lives there must be greater dissemination of knowledge, and yet, especially regarding the use of medical data, there is an ever increasing risk of breaches of individual privacy. Numerous studies have shown that even de-identified genetic data can lead to re-identification of individuals when combined with other data. Regardless, thousands have allowed their data to be collected and even published in public databases out of their interest in promoting basic research. It may well be that those donors had insufficient information about the potential for re-identification based upon that data. More concerning is the degree to which commercial databases are not only accumulating potentially re-identifiable data from their customers, but also how and to what degree they are profiting from its resale. There are two core concerns at the intersection of science and commerce in genomic data: ownership and privacy. While we could wait for regulations to come to the rescue, all past experience suggests that when they do they will favor corporations over individuals, and prove to be inadequate to the task, potentially stifling both basic science and individual rights. A technological approach that embeds these values is a good alternative, and blockchain technology offers a use-case that embeds these values more or less automatically.

Genomic data needs to be available for all researchers, not just pharmaceutical companies, it needs to be hard to re-identify individuals who allow it to be used for science, and it needs to allow people to be paid for its use.

While blockchains offer us certain opportunities to improve privacy, they are neither perfect nor sufficient. An immutable record, for instance, is no way to not leave tracks. Genomic information, when combined with things like geographical location or other identifying user data, could compromise our privacy when published permanently on a blockchain. But using a blockchain as a settlement and audit system affords for greater privacy in genetic data transactions, and improved individual indicia of ownership, than current models deployed both by testing companies and genomic data sharing platforms.

You should not put genomic data on public blockchains, but you should put transactions of such data on them, and enable those transactions through a blockchain, preferably with a token. If you put genomic data on a blockchain there is too much risk that that data could allow re-identification of the individual, and it is arguably illegal under statues like the GDPR that require that a person be able to delete their data. Rather, genomic data should be kept and analyzed within secure, distributed cloud architectures, ideally without the data being able to leave those environments. The data should be abstracted to as high a degree as possible from the individual, which means that any metadata associated with the genomic data, such as user-reported medical, demographic, geographic, etc., data, should not travel with the genomic data. By using a blockchain to complete the transaction, and to record the transactions of data, rather than fiat payment gateways, an additional level of abstraction helps ensure anonymity and privacy and provide an audit trail to track back misappropriations.

A number of blockchain-genomics startups have begun to build systems to improve and encourage the spread of de-identified genomic data, incorporating blockchains in differing ways. Each has its merits and drawbacks, but all are engaging in the good fight to promote greater user control, profit, and privacy of data. At EncrypGen, we have set the highest current standard of ownership: our users keep 90% or the profit they make in (opt-in on a study by study basis) sales of their data. Genomic data should be submitted and monetized anonymously. Token and crypto based payments should be used to eliminate payment gateways. EncrypGen includes these protections against re-identification.

Technology is not a fix all. There is ever-increasing pressure to hand over our genetic data to law enforcement, insurance companies, which are not putting our interests first. Laws like the GINA Act in the US, intended to protect the use of our genetic data by employers and insurers, are under threat of denudement or being struck down in the face of such pressures. Technology to safeguard our genomic data ownership and privacy while enabling genetic research is important and we must work to make it widely distributed, open sourced, and free. But if we want to protect ourselves against the possibility of government over-reach this requires that the regulatory environment is changed for the common good, with increased transparency and accountability, to protect and expand our rights, rather than undermine them.

Your genetic code is uniquely yours, yet it has absolutely no legal protection. For now, it’s the Genomic Wild West out there and you need to understand the stakes. While we strive to build technological firewalls against encroachment on our individuality and our most sensitive data, millions are paying to take DNA tests and giving corporations, insurance and law enforcement increased power and leverage.

David Koepsell is an entrepreneur, author, philosopher, attorney, and educator whose recent research focuses on the nexus of science, technology, ethics, and public policy. He has provided commentary regarding ethics, society, religion, and technology on: MSNBC, Fox News Channel, The Guardian, The Washington Times, NPR Radio, Radio Free Europe, Air America, The Atlanta Journal-Constitution, and the Associated Press, among others. He has been a tenured Associate Professor of Philosophy at the Delft University of Technology, Faculty of Technology, Policy, and Management in the Netherlands, Visiting Professor at UNAM, Instituto de Filosoficas and the Unidad Posgrado, Mexico, Director of Research and Strategic Initiatives at COMISION NACIONAL DE BIOETICA in Mexico, and Asesor de Rector at UAM Xochimilco. He is the co-founder of EncrypGen, Inc., the world’s first blockchain-based platform for genomic data exchange.

Your Most Sensitive Data

Written by David Koepsell, J.D./Ph.D.