WWC Discusses Ethics in Data Collection

NYC Data Science event coverage: Women Who Code

As data collection and data analysis have become entrenched in our daily lives, the conversations surrounding the ethical treatment and usage of data are becoming increasingly important. Almost every interaction you have — talking with a friend on social media, information you give to your doctor, or a purchase you make online — can be translated into and analyzed as a piece of data. The debates surrounding ethics, in any field, are never just a philosophical discourse: ethical debates in academia and private business often proceed changing legal tides. With all of the rapid advancements taking place in the field of data science, how do we ensure that our laws and legal codes reflect the longstanding ethical standards that have been cornerstones of fields such as health and medicine?

Last Monday, Women Who Code — an organization, with over 50,000 members, that hosts presentations and provides networking opportunities for women in the technology sector — gave a series of presentations on the ethical treatment of data and the ethical use of data analysis. The series of talks was hosted in the offices of Medidata — a cloud-based software company that helps expedite clinical medical trials — and the evening was kicked off by Medidata’s Sr. Lead Software Engineer, Purnima Mavinkurve, in a talk titled “Evolving Landscape of Data Privacy in Clinical Research.”

Clinical trials are controlled experiments for new medicines. If a scientist believes that a new drug can help patients with a certain condition, and the drug is proven to be safe, then clinical trials are used to determine if the drug is effective in treating a given condition.

Mavinkurve began her talk by distinguishing between the different types of personal data that can be collected. Whereas data that is generated by an online shopping site is focused on creating a buyer profile for a customer, clinical data aims to create groups of people who fit a parameter, such as a specific disease, or a certain medical condition. Whereas other types of data can give enough clues to determine a person’s identity, clinical data has to be anonymized, to the point where the participant’s involvement is untraceable.

While this distinction is clear to most data scientists, domestic and international laws concerning the transfer of data have yet to reflect this distinction, and this is where clinical trials run into a huge problem. Our current data transfer laws were enacted as a way of protecting customers, but because data transfer laws apply to all kinds of data collected, it is currently illegal for many foreign entities to share data. So if a pharmaceutical company in the United States is conducting research on the same disease that a university in Europe is studying, the data cannot be transferred.

A member of the audience asked, “Why is all the data treated the same?” There was no easy answer for this question. Ethical data sharing is important for finding a cure to some of our deadliest diseases, but there seems to be a general lack of understanding that there are different types of personal data that can be collected.

Throughout her talk, Mavinkurve stressed that data laws should reflect the ethics that have been part of clinical trials for decades. One of the guiding principals for clinical trials has been consent. Patients who want to engage in these trials are always informed as to how their data will be used and distributed, and Mavinkurve noted that most patients are fine with their clinical trial data being shared, as long as their data is fully anonymized. Hopefully, data laws will soon reflect this attitude.


Originally published at cds.nyu.edu on June 23, 2016.