DNA CARTOGRAPHERS, #5: Daniel Speyer
A new blog series shedding light on the behind the scenes aspects of DNA.Land
by Richard Aufrichtig
Since October, I have had the privilege of working for the Erlich lab as the DNA.Land User Engagement Coordinator and Technical Support leader. This has allowed me the exciting experience of communicating directly with users all over the world, fielding questions, and helping to calibrate the DNA.Land program for our users’ needs. Over the past nine months, I have received many questions about our team and what we do. Though we are an academic group from Columbia University and the New York Genome Center, many of our users are under the impression that we function similarly to large companies like 23andMe, Ancestry, and FamilyTreeDNA.
In an effort to help connect the scientists of DNA.Land more directly to our users, we have launched this blog series: DNA Cartographers. We hope that you enjoy learning more about our scientific team!
My fifth subject for the blog series is Daniel Speyer. Bringing the power of Google to DNA.Land, Daniel designed our most recent Ancestry Report. I interviewed him on June 27th, 2016 at the New York Genome Center.
Richard Aufrichtig: Thank you for being my fifth guest on DNA Cartographers!
Daniel Speyer: You’re welcome!
Richard: You’re the first person on our tech team that I’ve interviewed, and I think that you’ll provide our users with a different and new perspective into what goes on behind the scenes at DNA.Land. You also have an interesting entry point into this, as you didn’t join the team until after DNA.Land launched.
Daniel: That’s right.
Richard: I was hoping we could start off by discussing your DNA.Land origin story. I’m wondering, in particular, how you met Yaniv and what brought you to the New York Genome Center.
Daniel: My origin story has lots of twists and turns in it, but my background is in computer science. Skipping ahead through various things, I ended up working at Google as a site reliability engineer for a while. That’s a very back-end position — very much a role focused on keeping thinks working and making sure that they’ll continue to work. But, after I left that, I was a little uncertain about where I wanted to go next. I ultimately decided I wanted to move into bioinformatics, which I think is a much higher impact field.
Daniel: I took some courses at Columbia, and one of them was Yaniv’s “Ubiquitous Genomics” course. There was a lot of interesting stuff there. And, I think it was on the last day of class that he said: “By the way, the New York Genome Center is hiring! If you know someone who’d like to work here, please connect us!”
Richard: That’s awesome.
Daniel: Once I finished everything for that semester, I e-mailed him and said, “Well actually, a person who might be interested in working here is me!” [Laughing]
Richard: [Laughing] Can you tell me a little bit more about how you first got turned onto bioinformatics and genetics? That seems to me, as an outsider, quite different from doing back-end stuff at Google.
Daniel: In a lot of ways it is. I suppose I took a look at the state of the tech field and it seemed like there was an awful lot of effort going into getting people to click ads.
Daniel: And, that just didn’t seem like what I wanted to dedicate my life to working on.
Daniel: Working on Google’s back-end does build up a very specific skill-set to a very high level, though. And, I started to think about what, by my definition, important things I could contribute to. That’s how I ended up finding bioinformatics.
Richard: Ah, nice!
Daniel: In bio-informatics we have entire classes of diseases that we don’t know how to treat yet. We’re just building the tool-sets to start looking at it. Some of that tool-set involves very large data handling — which I know a lot about how to do.
Richard: I’d be interested to hear you talk about the origin of the New Ancestry Report. I know you have the blog post that I’ve shared many, many times. But, because this is going to reach a different audience, I’d love if you could share what your thought process was, how it differs from our original Ancestry report, and what dreams or aims you might have for future ancestry reports.
Daniel: The main thing that I was thinking a lot while I was building the new Ancestry Report was: “We have this information — it’s not all the information we’d like to have, but it is something, and it is what we have. How do we express to the user: ‘We know this much and not more’?” That’s what led to the: “includes, does not include” section.
For example, if somebody wants to know whether or not they have Portuguese ancestry, and all we can tell them is that their ancestry looks more like Spanish and Southern French than it does like Moroccan or English — I wanted that to be clear. At that point, it’s entirely possible that the user in question might look at the report and feel either disappointed or frustrated. But, at least we know they haven’t been misled. That same idea then turned into the new map, which draws everything based on the geography that we actually know.
Daniel: The previous ancestry report worked in terms of coloring in modern nation states. And, y’know, 1) people get touchy around anything involving modern nation states (and I can’t blame them), and 2) it’s just not a very accurate depiction of peoples — insofar as peoples are geographic.
Richard: Hmm. Something that I’ve personally found interesting, learning more about the whole genetics and genomics field, is really coming to understand how inaccurate nation states are when speaking about populations.
Richard: The more that I learn about it, just by virtue of hearing from all these users, and being a part of the conversations, it’s been really fascinating to think about.
Daniel: Well, it’s a good thing, really. I mean, think about how many times nation state borders have changed throughout history. And, imagine if they made the populations change with it. But, don’t imagine very hard — because, you don’t want to go there.
Richard: As the person who interacts with the users, I think there’s an interesting exchange going on in that question. Because, people want a certain answer, but the answer that they want can actually sometimes be inaccurate. So, they’re asking the question from a place of misunderstanding. I think the field of genetics and genomics could actually be a really amazing pivot tool of understanding the non-functionality of nation states towards representing peoples. It doesn’t seem like it’s going that way, but I think it could lead to that.
Daniel: Let’s not lose sight of which things matter to which people for which purposes. People-hood is often not about biological affinity. Just look at the United States. Or pick your favorite example from history.
Richard: I’d be interested to hear, because a lot of users have questions about it, how far back the Ancestry Report actually looks. My impression is that there’s not actually an exact answer for that, and I’d be interested in hearing you talk about why that is.
Daniel: Well, the important thing to remember is that we’re working with a reference panel. I put some details about this in the blog post. Everything is about comparing a user’s file to that reference panel. So, if we say your ancestry is North-West European, that roughly means that your Ancestors have been in North-West Europe as long as the ones in our reference panel were. And, y’know, you can look up the history of North-West Europe if you’d like to try and figure out what that would mean. I believe the reference panels excluded recent immigrants. If your grandparents were born in that area, I believe, was the requirement to be in the reference panel.
Richard: How does one go about getting a bigger reference panel?
Daniel: So, there are other studies where we may be able to go to the people who did them and ask for the data. There’s been some very fine-grained stuff in Europe, cause there are well-funded health services interested in doing that there.
Richard: Can you talk a little bit about the projects you’re currently working on at DNA.Land? What’s taking up your time here?
Daniel: Well, for the last couple of months I’ve been working on the NBCC partnership project.
Richard: Could you talk a little bit about that project? I know it relates to discovering the underlying cause of breast cancer recurrence.
Daniel: Sure. So, as a general rule, you can’t really do much with just genetic information. You want to be able to compare the genetic information to something else. And, in this case, that something else is breast cancer. While some of our users have actually been diagnosed with it, more of them have close relatives who’ve been diagnosed. To be clear, for our purposes “close” means sharing half your DNA.
Daniel: That being said, both of those cases are useful, and we want to ask them for as much relevant medical history as they can remember. That will allow us to starting doing more interesting research with the data. But, it turns out that making a survey that ask users for this information in ways that they will actually fill it out, and will fill it out correctly, is fairly tricky. You need to think about all the different ways that something could go wrong filling out a form.
Daniel: We’ve spent a lot of time considering who will actually remember the pertinent information, and making sure there’s always a good way to say: “I don’t know.” I am trying to cover everything and make sure there is always a good and apparent way forward. On TV Tropes, there’s a page: “The Dev team thinks of everything.” I’ve taken that as inspiration. I don’t know if I could actually wind up on that page, but it’s a nice goal.