Genetic Sudoku is here, and it vastly speeds genomic analysis

Published in

ExtremeTech Access

5 min readNov 15, 2016

by Graham Templeton

Right now, the problems with DNA have nothing to do with DNA. The molecule itself, deoxyribonucleic acid, is extremely well characterized at this point. We can read it, write it, and manipulate it. We can edit it in living cells, and create alternate versions of it with special properties we need. We can use it to do computing or even fold it up to make simple little robots. But understanding the fully detailed role of DNA in the cell is still far beyond our reach, hidden behind barriers not of chemical sophistication but practical and computational understanding. Now, researchers from Princeton and Harvard have a new method to try to fix this problem, and it turns out that much of its impressive speed derives from a classic approach to the game of Sudoku.

The innovation has to do with one of the most powerful techniques in genetic analysis: whole genome knockout collections. These collections aspire to have a separate, labelled colony of mutant bacteria for every viable knockout mutant — that is, the scientists inactivate every gene one by one, and save a sample of any of these single-gene-knockout mutants that manage to survive. Then, by subjecting this entire collection of single-gene knockout bacteria to some chemical agent or analysis, it should be easy to see which if any of the genes is responsible for a particular interaction. If just a few colonies don’t react as expected, then the gene that’s been knocked out in each of those mutants is a good candidate for further study.

Genome sequencing costs have fallen dramatically over time — but have they fallen far enough to allow this?

Now, it should be obvious in this setup that if you don’t know which genes are “knocked out” in which mutants, then the whole thing is pretty useless. So, historically, when scientists have tried to speed up this knockout-each-gene-one-by-one process, using a quick and easy system to inactivate genes at random has not been a good solution. Expose enough bacteria and you can be assured that you’ve got a mutation into every gene in an organism. But if you can’t plan which colonies will lose which specific gene, then you’ll just end up having to sequence the genome of any interesting mutants and then do a search of that sequence to see which gene is missing. That’s neither fast nor cheap to do. When you have to do it hundreds or thousands of times across an entire collection, it more than wipes out the efficiency increase of using randomized mutations in the first place.

That’s where Sudoku comes in — but Sudoku occurs on a grid! Before they could begin their analysis, the team had to transfer their mutant colonies to plates full of equally spaced “wells” for colony storage. After an abortive attempt to use research assistants to transfer mutants by hand, the team decided to use a stock colony transfer robot to do the job, and quickly turned one jumbled colony with all randomized mutants into almost 40,000 little colonies, each expressing its own uniquely mutated genome. These colonies present the same column-and-row system you’d find in Sudoku; they’re totally stock lab equipment, but this approach casts its design in an all-new light.

The researchers took several hundred plates and imagined they made a single grid — Column 1 on Plate 1 leads into Column 1 on Plate 2, etc. Dividing the whole into rows and meta-columns, the team applied a unique DNA “barcode” to each one. so each well contains a unique combination of barcodes indicating its specific location. Now, when the researchers see a genome in a big mix of genomes, they can easily identify its location by using Sudoku-like deduction to narrow in on the only possible well location.

The overall effect is incredible, allowing the team to sequence their meticulously inter-barcoded collection just one time, in a single enormous meta-genomic sample. Their DNA barcode Sudoku allows them to sift through this incredible mess of genetic information and assign each subsection to a specific well — thus, identifying the full genome in every well. The researchers say that their approach allows them to create a complete, annotated knockout collection in just a single day, which could represent as much as a 100-fold reduction in time and a 20-fold reduction in cost.

Now, there are other problems with randomization. For instance, this study was conducted in a bacterial species called Shewanella oneidensis, and if you randomly knock 3,600 of its genes, you would have to be the world’s luckiest researcher to just-so-happen to get a perfect one-per-gene knockout of each and every one. In reality, you’re going to get some genes with multiple knockouts, and some with none at all. So to ensure that every gene is affected at least once, the overall number of mutants has to be far greater than the number of genes you’re trying to mutate. Performing tens of thousands of knockouts, and then sifting through them can wash out the efficiency gains we’re after here.

This colony-picker robot can grab and transfer bacterial samples quickly.

To get around this, the researchers inserted another level of statistical analysis, using codebreaking-like Bayesian inference to search based on the most likely profile of a single-copy gene knockout event within the whole genomic sample. This weeding process helps make sure they have one and only one copy of every knockout, and keeps the Sudoku algorithm from getting catastrophically complex.

Right now, studying DNA is a bit like studying an alien race. We could abduct a few isolated animals and spend many years studying their anatomy — the hips, the hands, the brains, everything. We understand how they move through the world, manipulate that world, and how they are aware of it. We could clone new humans and do surgeries to make some of them different and useful in unique ways. We could learn just about anything about their physical bodies — but with only that level of understanding, we would be pretty bad at predicting their actions in a complex society. Knowing most or even all of the attributes of each individual wouldn’t necessarily let us predict much about the the interactions between those individuals. To learn that, we’ll simply have to watch.

Genomics, these days, is about learning how to do that. How do you track a gene’s action through the dizzying complexity of the cell, with all its interactions and false positives? The answer is research like this, letting scientists quickly and affordably annotate with the detail this observational mission requires. This is just one of many such innovations that will be needed to let genomics dig into the true secrets of DNA, and the cell.

Now read: How DNA sequencing works

Originally published at www.extremetech.com on November 15, 2016.

Genetic Sudoku is here, and it vastly speeds genomic analysis

Written by ExtremeTech