Feds at Work: Using genome sequences to combat infectious disease
Built and headed the world’s largest and most influential repository of genetic sequence data
After nine people in four states fell sick with listeria, a type of food poisoning, federal investigators went looking for a link among the victims. They found the likely culprit to be bacteria in frozen peas, carrots and other vegetables packaged by a company in Pasco, Washington, according to the Food and Drug Administration.

Investigators figured out the connection among the cases, even though individuals who became ill lived as far away as Connecticut and Maryland, by examining the genome of the bacteria — akin to a DNA fingerprint. They used it to identify those patients around the country infected by the same strain.
“They are the world’s genomic Google.” ~ Dr. Alex Greninger
And the frozen food company recalled more than 800 types of products possibly linked to the outbreak.
Federal investigators got help with their sleuthing from an enormous and growing storehouse of hundreds of millions of DNA sequences compiled by the National Institutes of Health. GenBank, the world’s largest repository of genetic sequence data, was managed for 30 years by Dr. David Lipman and his team at the National Center for Biotechnology Information in NIH’s National Library of Medicine, until Lipman left in June for a private sector position.
“They are the world’s genomic Google,” said Dr. Alex Greninger, a resident in laboratory medicine at the University of Washington. Greninger said he is “constantly in awe” of the work done by the center’s team of computer scientists, molecular biologists, mathematicians, biochemists, research physicians and biologists.
Scientists studying virtually all types of infectious diseases — from influenza, hepatitis, Zika and Ebola to bacterial pneumonia, tuberculosis and malaria — use the DNA sequence database. So do those doing surveillance regarding antibiotic resistance, a serious and worldwide problem. Researchers also use the database to study chronic diseases, including cardiovascular disease, rheumatoid arthritis and lupus.
Comparing pieces of a genome — a complete set of chromosomes of a human or other organism — is similar to taking a DNA sample from a crime scene and seeing if it matches that of a suspect. Matching the unique sequence of the bacteria’s DNA found in listeria patients, for example, indicated a high likelihood the illness came from the same source.
“He was a visionary in his ability to look at how important this information would be.” ~ Dr. Steven Musser
Lipman had the foresight to understand the value of trustworthy and available data on genome sequences for scientists and researchers throughout the world, according to Dr. Steven Musser, FDA’s deputy director for scientific operations.
“He was a visionary in his ability to look at how important this information would be,” Musser said. “It’s one thing to have books in the library and another to have people look at them, read them and make use of them.”
Researchers not only make use of the genomic “books” that the biotechnology center provides, they also contribute to the collection by sending raw data for the center to annotate and add to the database.
GenBank is one of several enormous repositories developed under Lipman’s leadership that the public can use online for free anywhere around the world — from PubMed Central, an archive of articles from thousands of the world’s leading biomedical journals, to PubChem, a resource that connects chemical information with biological studies.
For GenBank, Lipman and his team process hundreds of thousands of genome sequences daily, using high-performance computing. They manage incoming data, clean it up, remove mistakes, annotate it and release it in a form that is easier for scientists, researchers and others to use.
“The government does an amazing job for the world,” handling this “firehose of data” and making sense of it, Greninger said.
Lipman, the center’s first director, managed the organization but also made “the single largest contribution to the development of these essential algorithms of the software, contributing many highly original and unique ideas and principles,” said Eugene Koonin, a senior investigator at the center.
“This is bringing people together who would not be brought together any other way.” ~ Dr. Steven Musser
A side benefit of the work is that federal agencies, states and local governments now collaborate in ways they didn’t before, according to Musser. FDA and the Centers for Disease Control and Prevention “kind of fumbled around,” he said. “Now, we work together seamlessly.”
The Department of Agriculture, state public health labs, and scientists and researchers in Europe and Canada also collaborate now.
“We’re really at the cusp of a transformation of how we do infectious disease surveillance and public health worldwide.” ~ Dr. David Lipman
“This is bringing people together who would not be brought together any other way,” Musser said.
GenBank has made great advancements since its start as a small and simple database, according to Lipman.
“We’re really at the cusp of a transformation of how we do infectious disease surveillance and public health worldwide,” he said. “This is no longer just a research tool on infectious disease, but a tool to combat it.”
David Lipman is a finalist for a 2017 Samuel J. Heyman Service to America Medal, or Sammies. Each year, the Partnership for Public Service honors federal employees whose remarkable accomplishments make our government and our nation stronger.
For the third year, we will also present the annual “People’s Choice” award. Please vote for the person or team you find most inspiring. (Voting closes at 11:59 p.m. EST on September 15, 2017.)
