AI could be used to reduce bacterial food poisoning
Novel sequencing technologies and AI classifiers may bring disease surveillance to a new level
“Food poisoning” is a common condition, experienced by almost all people at least once in their lives. The symptoms are nausea, fever, abdominal pain, and diarrhea, which are unpleasant but usually not life-threatening. However, even lethal cases are detected annually. Most of such conditions are classified as bacterial gastroenteritis. Although many species of bacteria may cause food poisoning, two species — Campylobacter jejuni and Campylobacter coli — are among the most common and the most troubling in terms of public health, leading to more than nine million annual cases in the European Union alone.
These bacteria are harmless commensals in the guts of many birds and animals, but they become pathogenic if digested by humans. The sources of infection are animal feces in drinking water and raw or undercooked meat. Surprisingly, it could be notoriously hard to find and localize the source of infection.
In order to trace the reservoir of bacteria researchers compare their genomes from infected humans with the gut microbiomes of different wild animals. However, such analysis is limited by rather small and unreliable genomic differences between bacterial strains and technical difficulties in sequencing. Indeed, the most common and widely used technique is multi-locus sequence typing (MLST) which reveals DNA sequence variation across only seven essential genes that are common to all strains of Campylobacter, which is less than 0.2% of the whole genome.
The whole-genome sequencing (WGS), which is now relatively cheap and accessible, comes to the rescue, allowing to compare complete genomes. In principle, this should greatly increase the reliability of genome attribution to different host animals, but in practice, this method is hampered by overwhelming amounts of data, which come from WGS.
As usually in such cases, the Machine Learning techniques allow to tame the chaos in the data. The recent paper published in PLOS Genetics proposes an AI model, which traces the sources of infection caused by Campylobacter.
The authors used 5,799 C. jejuni and C. coli genomes isolated from various sources and host species including chicken, cattle, sheep, birds and the environment. In order to prevent the bias these data should be carefully sort into training and testing datasets. If different genomes from the same bacterial strain will be present in both datasets, which is common for random sampling, the model performance will be severely overestimated. Thus the authors used phylogeny-aware sorting and put all genomes of the same stain into either training or testing dataset.
Several models were built using recurrent neural network, 1-dimensional convolutional network, long short-term memory network (LSTM), shallow dense network and deep dense network.
It was concluded that ML-based classifiers outperform conventional methods based on MLST and allow to find the source of infection much better. Moreover, they allow to understand the limitations of host attribution algorithms. The most frequent misclassification was found between sheep and cattle, which is explained by overlapping genetic features of bacterial strains in these animals.
Even more important advantage of AI models is their ability to identify fine-grained evolutionary feature of bacterial genomes, which facilitate human disease. The authors have previously unknown evolutionary links in bacterial clonal complex CC-21, which is the most abundant in humans.
The AI systems, based on the full-genome sequences of pathogenic bacteria, could eventually lead to the development of automated systems of continuous or even real-time disease surveillance and monitoring systems. There is a great potential for reducing, if not eliminating, the risks of food-borne infections caused by common bacterial pathogens.