Leaving Google self-driving car to make AI more accessible


For 10 years, I’ve been teaching machines to learn. As an undergraduate at Sharif University and a graduate student at UC Irvine, I toiled to make computers do what once seemed possible only for humans. Sure enough, I soon found myself in the Bay area, working on Google X’s much-anticipated Waymo project. Waymo’s promise — the self-driving car — will be one of the most impactful contributions of our generation. 1.2 million people die in car crashes every year; 94% of them are caused by human error. I’m confident that our work will one day drastically reduce, if not eliminate that number.

As much as I believed in Waymo’s mission and loved working with its incredibly talented slate of engineers, I would eventually hear my calling elsewhere. Through casual conversations with friends in the industry and a little extracurricular reading, I was captivated by the revolution happening in healthcare. Eric Topol, in his book The Creative Destruction of Medicine, characterizes this revolution as “a propitious convergence of a maturing Internet, ever-increasing bandwidth, near-ubiquitous connectivity, and remarkable miniature pocket computers in the form of mobile phones.” In a later book, Topol further explores the democratization of medical data — what he calls healthcare’s ‘Gutenberg Moment’ — and its significance for providers and patients alike. Just as the printing press transferred the power of knowledge from the priest to the everyday parishioner, so has digital technology put an end to medical paternalism and the institutions that perpetuate it.

Fascinated by all this, I left Google to study further. I got my hands on as many books as I could. I took classes in Chemistry and Biology with Khan Academy, etc. I spent hours in Stanford’s library pouring through books on Bioinformatics and Biology. Bouncing from one coffee shop to another, I learned that the driving force behind this revolutionary moment in healthcare certainly doesn’t come from the industry itself. In fact, healthcare is notorious for the snail’s pace at which it adopts innovation. No, the way we do medicine is being turned upside down by data — massive data. That’s why leading health insurer Humana, for example, now sees itself as “more of a data analytics company than anything else.”

Leading healthcare organizations’ new mission is to help doctors use data to provide the best treatment for their patients. What, of course, is the ‘best’ treatment and how do we go about finding it? Not too long ago, the best we could do was cut people open in order to let out “bad blood.” After 150 years of experimentation, however, we learned that diseases like tuberculosis come from bacteria — not imbalanced humors. But now, even the way we experiment is changing. As mountains of data become available to us, we do much more synthesizing than we do hypothesizing. It seems we no longer have to make big ‘guesses’ about what will work. Instead, we can just read off what we need from the data. Life science has become a hard science.

Nowhere does this show up more poignantly than in genomics. It took 13 years to sequence the first human genome. Now, the University of Toronto plans to sequence 10,000 in a single year. With access to such a library of genomic data, we can literally inspect the blueprint of human life. Instead of guessing where hearing loss comes from, we can now identify and target specific genes and treat them to restore aural function. With advanced technologies like CRISPR, we can even snip out offending gene segments and replace them to prevent or treat diseases that, up to now, have been virtually untouchable.

Again, this all depends on processing inconceivable amounts of data in timely and manageable ways. Each human genome takes up 300–400 GB of hard disk storage. By 2025, we’re estimated to have somewhere between 100 million to 2 billion of them sequenced, requiring up to 40 exabytes of storage. As unfathomable as those numbers are, the level of processing power needed to mine that data is all the more incredible. Thankfully, advances in next-generation sequencing (NGS) have drastically decreased the amount of time and resources needed to analyze the data. Bioinformatics hardware and software from companies like Illumina and Roche are helping us run experiments which used to take days in only a few hours.

Despite exciting developments in NGS, we have a long way to go in understanding this data. In addition to genomic data, we also have rapidly growing sets of microbiomic, physiologic, anatomic, biologic, demographic, and environmental data covering both individuals and populations. In each of these domains, the future of life science research rests on our ability to analyze, store, and retrieve the sequencing work of others. If we’re going to keep up with the current pace of innovation, we need a heavy dose of computational infrastructure to help us do so. That’s why I started Clusterone — to provide technological solutions to the problems of scale that keep scientists from making further advances in their respective fields.

At this moment in history, we have a deeper and more intimate knowledge of the human body than any generation before us. There’s an endless array of data out there, waiting to be sequenced, analyzed, and translated into meaningful action that will change people’s lives for the better. Nature — human or otherwise — is an open book. We have the methods at our disposal to not only read it but to write an exciting new chapter in human history. Clusterone wants to make sure the experts involved in that work have enough ink in their pens to do so.