How Google Fiber is changing the game for genomics

Genome interpretation in the terabase & gigabit era.


Hello Google Fiber gigabit speeds.

I’ve had Google Fiber at my home in Provo, Utah for over a month now and it is quite simply amazing. Check this out:

No commentary needed here. Just maybe a moment of silence.

And here’s a speed test I did via wifi on my macbook air, with Apple’s AirPort Extreme router:

600 Mbps ?!?!

Yup. Be jealous.

Google fiber, meet my collection of 200 GB genomes.

To give you an idea of what this has the potential to do for genomics, check this out:

This is me on regular internet, at the office, trying to download a little tiny 200 MB file. 736.9 KB/s = 5.9 Mbps.
And this.. This is me on Google Fiber, tackling 200 GB human genomes with ease. Note that 1 MB/s = 8 Mbps, so 16.5 MB/s = 132 Mbps. Super fast, and the bottleneck isn’t me.

These are screenshots of me downloading a couple genomes at home (on Google Fiber), compared to downloading from the same remote university server from my office downtown Provo (using a common small business internet package we have there). There is just NO comparison. What used to take days (or even longer), can now be done in hours (or less).

As a big data geneticist, I’m dealing with pretty large file sizes. The size of a human genome is usually about 200 GB per genome, in its raw data format as it comes off the sequencer. Using the speed from the screenshots (132 Mbps on Google Fiber vs. 5.9 Mbps on my old connection), it would take a little over 3 hours to download a whole genome on Google Fiber, but over 77 hours to download that same genome on my regular internet at the office. And… I’m not the bottleneck. I was before, but now that 132 Mbps I’m getting off of an FTP server in another state is nothing. Check out what happens when I open up a few connections at once to the University in another state, and download a 200 GB genome that way. Download speeds approach gigabit and I can move an ENTIRE human genome in easily less than an hour. (Note that there are multiple genomes in this screenhot, and I only opened up 4 connections at once, but could have pushed that up even higher).

Google fiber eliminates the pain & bottleneck of data transfer for genomic medicine.

Why data transfer speed matters in genomics.

These are genomes from some patients and family members with severe autism, developmental delay and seizures. Faster transfer to the analytics pipeline means quicker results. We can get important answers to patients and families who need them much quicker.

And this is a relatively tiny data set. A research collaborator asked me recently if we could help them move a couple thousands genomes from one state to another. A couple thousand??? Not with old internet speeds, but now we can. That is hundreds of terabytes of data. There was talk about mailing hundreds of hard drives, but that just seems done. The internet infrastructure and bandwidth that we’ve had access to (up until now) just couldn’t handle it. It would take weeks and weeks, and would probably time out a million times before even making a sizeable dent in the data transfer. Faster genome transfer means quicker results, more open collaboration, expedited scientific disocvery, which means improved understanding of how to diagnose AND treat human diseases.

The promise of fast, accurate genetic diagnoses in newborns.

Let’s look at a scenario where fast genome transfer speeds could be extremely useful, even life-saving: the newborn ICU. I was recently approached by a large hospital system because they had seen what researchers had implemented at Children’s Mercy Hospital in Kansas City, MO, and wanted to look into doing something similar. What Dr. Kingsmore and his colleagues had done in the NICU at Children’s Mercy was impressive. So impressive, in fact, that the US Goverment announced a $25 million program late last year to push this type of thing forward.

This is from last year: “One can imagine a day when every newborn will have their genome sequenced at birth, and it would become a part of the electronic health record that could be used throughout the rest of the child’s life both to think about better prevention but also to be more alert to early clinical manifestations of a disease,” says Alan Guttmacher, director of the US National Institute of Child Health and Human Development. It now costs $1,000 or less to examine the protein-encoding portion of the genome and about $5,000 to sequence an entire human genome, so that day may be approaching quickly. And studies released over the past year have found that genetic sequencing might find a genetic cause for illness in 15–50% of children with undiagnosed diseases.” And this year, the long-awaited $1000 whole genome finally arrived.

Here’s the basic idea behind newborn ICU genome sequencing:

Background

  • >4 million babies are born each year in the United States.
  • 1 in every 20 babies born is admitted to the newborn ICU.
  • Up to 1/3 of babies admitted to a newborn ICU have genetic diseases.
  • More than 3500 single-gene diseases have been characterized, but traditional genetic testing is only available for some of them. Even if they were available, how does a physician pick the right ones? Plus, most of them cost hundreds, even thousands of dollars each. At least 500 of these genetic diseases have a known treatment.
  • Every night in the newborn ICU costs >$10,000. “A hospital stay can cost a quarter of a million dollars quite easily,” points out Dr. Kingsmore.

Why sequence newborns?

  • By conducting rapid genome sequencing and interpretation, “physicians can make practical use of diagnostic results to tailor treatments to individual infants and children,” says Dr. Kingsmore.
  • For example, babies born with the rare genetic disorder phenylketonuria (PKU) are unable to break down a certain amino acid, which can lead to brain damage and seizures. If found early enough, however, PKU is easily treated and children can move on with their lives.
  • Another example: Muscle contractions due to mutations in the sepiapterin reductase gene respond to drugs that are ineffective against other movement disorders that may look the same, but have a different genetic underpinning.
  • Symptoms of many genetic conditions, such as Charcot-Marie-Tooth (CMT), sometimes do not present until adulthood so a genetic test early on could help to save the lives of older individuals.
  • “Overall, it can save time, it can save lives, and a lot of times, it can save suffering,” Kingsmore said.

Sequencing technology

  • The study at Mercy used Illumina’s HiSeq 2500 sequencer, which can generate 120 Gigabases in 27 hours. This was in 2012. The new Illumina HiSeq X Ten announced this year, aka the $1000 genome machine, can >1.6 Terabases in about the same time period (more than 10x the sequence data). That’s 16 human genomes every 3 days. Woah.
  • After adding on time for targeted, symptom-based data analysis, Kingsmore’s group was able to get the sequencing & interpretation time down under 50 hours.

Caveats

  • Mercy’s 50 hour turnaround time did NOT include transportation or data transfer time. And many hospitals, even large medical centers, do not own genome sequencers. Not surprising since the Illumina HiSeq X Ten costs $10 million to purchase. $10 million!
  • As you recall from my genome transfer experiment above, in many settings with traditional internet speeds, the time it takes to transfer a genome from sequencing facility to the person who ordered it (or in the case of Tute, to our cloud-based platform) it can take days or even longer depending on your internet speeds. In fact, it’s more common to mail hard drives full of genomes these days, than it is to transfer them electronically.

Personalized medicine: are we there yet?

Eventually, every newborn will get their genome sequenced, not just those in the newborn ICU. It just makes sense. Kingsmore’s landmark study was published in 2012. A year later, in late 2013, the US goverment announced $25 million to push newborn sequencing forward. Even where we are now, at the beginning of the genome revolution, with genomic medicine in its infancy, the benefits are clear. The challenge is the data, including both the bioinformatics analysis AND the data transfer.

It used to take me days to download a genome from the lab, before I could even start to analyze it. Now, with google fiber, I can download a whole human genome in less than half an hour. That is a HUGE difference when someone’s health is on the line.

Until the day arrives where every hospital has one (or many) little desktop (or even handheld) genome sequencers, data transfer will continue to be a major bottleneck. In order to realize the promise of rapid genome sequencing in the ICU, we need gigabit internet connections on both ends.