DNA Privacy: Most People Are Getting It Wrong
How to use DNA tools for their intended purposes without compromising your security
Privacy is the biggest concern that most people have when considering whether or not to get their DNA tested. However, most people who’ve had their DNA sequenced for ancestry or health reports are failing to take the most basic step for safeguarding their privacy. More specifically, when it comes to sharing one’s name, DNA information, and family tree, most people are doing it all backwards. And it’s pretty silly to get DNA privacy backwards if that’s your biggest concern.
How are people getting privacy wrong? On DNA matching platforms, people are generally using their real names, sharing their DNA information, and not including a well-populated family tree to which relatives can compare their own. I believe that people should never use real names on profiles in DNA relative databases, should probably share DNA information, and should definitely include the best family tree that they can put together, otherwise I’m not sure what’s the point of looking for DNA relatives.
Names
You might say that only your own relatives will be seeing your information, and not the general public, so it’s ok to share your name. But a good number of your DNA relatives aren’t true matches. You can see this when one or both of your parents get their DNA tested, at which time many of your DNA relatives completely disappear.* Even if all of your DNA matches were bona fide relatives, the vast majority will be distant cousins — on the order of 4th cousins or higher. Do you trust 8,000 distant cousins not to misuse or share your DNA information?
Some people have exported their DNA Matches to text files or spreadsheets. I’ve additionally taken notes about my DNA segments, sorted first by chromosome number and then by segment starting point. This allows me to group my relatives by matching segment so I can figure out what ancestors my segments came from by noting similarities in family trees. People who’ve exported data or taken notes could guess some of the time which new pseudonyms belong to match names that have disappeared based on the number of centiMorgans (cM) that match on a particular segment. Perhaps now you’re seeing how hard it is to remove your associated name and DNA information from the Internet once it’s there. However, now’s the best time to start undoing that association. Please, don’t let me scare you into giving up on genetic genealogy — just change your profile name. I’ll admit, if many people change their profile names it’s going to throw off my notes. But I’d be willing to accept that inconvenience for the sake of everyone’s privacy.
One thing I’m not saying is to never share your name along with your DNA information. I think it’s a great idea to connect with people and try to figure out how you’re related or to share information once you do figure it out. I just think your name shouldn’t be there for all of your “relatives” to see.
There’s an additional benefit to not sharing your real name. Unless the company you’re sharing your DNA with is willing to give your real name to law enforcement agencies, officers would likely have to ask you some questions via private message if they wanted to use your DNA to track down a third cousin of yours who was suspected of committing a crime. This leaves the control of what your DNA is used for in your own hands. If you want to help law enforcement agencies, you can. If you believe that agencies are starting to abuse DNA tools to convict people of victimless crimes, you can withhold your information.
Sharing DNA Information
What does it mean to share your DNA information? Specifically, I’m talking about allowing your DNA relatives to see on which chromosomes and segments they match with you. This is the norm for most DNA matching platforms. And it’s the only way (other than social network modelling, also known as clustering) to find out which of your ancestors a particular segment came from, so I don’t think you should stop sharing your DNA information.
A Well-Populated Family Tree
What does it mean to have a well-populated tree? The vast majority of people in DNA-matching databases have either no tree or only three or fewer generations of ancestors in their shared trees. MyHeritage is the company that does this best, however most of the trees there still have only a few generations of ancestors. And many of people in those trees have no birth dates, death dates, or places associated with them, which makes the tree not very useful
While it’s important to include a family tree on your DNA matching profile, you shouldn’t include your parents’ names. If you do, you might as well be using your real name. When you create or upload a tree, most websites allow you to keep private the names of any individuals who are still living. Many people wisely choose that option. But, even if your parents are deceased, I would still recommend keeping their names private.
The way people use different DNA testing companies varies dramatically. For example, at 23andMe, a much lower percentage of people share their DNA information and almost none have a family tree available; but, like with most DNA platforms, almost everyone uses their real name. That’s the best example of doing it backwards. Without sharing their DNA information or family trees, people are fairly safe, but users are hardly getting any benefit from using the service. At MyHeritage, people are generally sharing everything. Almost everyone uses a real name, almost everyone is sharing DNA information, and a surprisingly great number of people have at least a small family tree. At GEDmatch, every single person is sharing DNA information, very few people have family trees, and more people use pseudonyms than on other platforms. But guess what? Everyone at GEDmatch has to share an email address, and a lot of those email addresses make people identifiable. Probably even worse is the fact that you can see all of your relatives’ relatives (even if they don’t match with you), and all of their relatives, and so on if you’re willing to go down one genealogically useless rabbit hole after another. Who would be wiling to do that? Only someone interested in something other than genealogy. I don’t think you should stop using GEDmatch — just change your profile name.
I’m reluctant to publicly state my biggest reason not to include both your name and your DNA information because I don’t want to give anyone the idea of exploiting it. Most people who share DNA information have probably never thought of this particular risk. In fact, I’ve never heard anyone mention it. It isn’t about someone hacking a database and getting your data. It isn’t about your data being shared with third parties or insurance companies charging extra money for pre-existing health risks. Those are valid concerns, but people are already taking safeguards for those risks, or at least are voicing their concern. And, yet, there’s still one really good reason not to use your real name while sharing your DNA information.
Rather than stating that reason, I’ll instead give this advice: If you’ve tested your DNA, get really familiar with what you can do with it. That is, use the DNA tools for their intended purposes, but don’t divulge unnecessary information. I’m not saying you have to spend a lot of time on anything. Get a feel for grouping your DNA matches by shared segments. Feel free to analyze your DNA for health information. (If you haven’t tested with 23andMe or you want to see additional health reports, upload your raw DNA to Codegen.eu. This is the only free, wide-ranging, health-related DNA website that’s still recommended by GeneticLifehacks.com after de-listing less responsible sites.) And change your profile name!
One final thing: If people shouldn’t use their real names, what should they use? While using your initials would usually be fine for maintaining privacy, it would result in a lot of duplicate names. This is actually a problem. For example, try ranking your top relatives by total cM for all matching segments. Even if you did that right now, with most people using their full names, you’d have some names near the top of your list that are only there because they’re duplicates. Similarly, names like “<Private> L” or “Unknown Smith” would result in even more duplicates than using initials. Another option some people choose is to use a one-word name similar to those in Instagram profiles, like Dadjokeguyfromcanada. I’m personally not a fan of that, but I suppose it works. The best option I can think of is to use an ancestor’s name. Pick the weirdest one in the bunch, or the one who’s the biggest mystery. That way, it will stand out to people who may know something about your ancestor. It won’t look like you’re trying to fool anyone — everyone will know it isn’t your real name when they see it several ancestors back in your very well-populated tree. (;
*When just one parent is tested, removing the false matches is done by a process called phasing. When both parents are tested, your kit isn’t needed anymore. Many of your matches won’t show up as matches to either of their kits. This happens when one of your segments came from, say, your father, but it’s made up of a segment from your paternal grandfather and one from your paternal grandmother. During recombination in meiosis, chromosome crossover will lead to such an effect. Double crossovers and those of higher order can occur, but will be less likely to match with other DNA relatives as the order gets higher. It could also happen on your mom’s side — when segments from your maternal grandparents seamlessly line up together, and the one from your maternal grandmother could contain segments from multiple ancestors of hers, again making it less likely for you to match with someone on that segment. However your genome came to contain this segment, if it’s greater than 6 centiMorgans (cM) for both you and another person in the same DNA database, the two of you will be considered a match. Their segment could be one that they inherited intact from an ancestor, or it could be from two different ancestors like your segment, either way the segments are identical by chance rather than by descent
Feel free to ask me about modeling & simulation, genetic genealogy, or genealogical research. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peer-reviewed standard deviations. That model was also used to make a very accurate relationship prediction tool. Or, try a calculator that lets you find the amount of an ancestor’s DNA you have when combining multiple kits.