AI for Science in 2023: A Community Primer

AI for Science
24 min readJan 17, 2024

--

Introduction

2023 was an amazing year for the AI for Science community, with progress, challenges, and new promises. In August, a review paper from the community was published in Nature to summarize the progress the AI for Science community made in the past several years following the workflow of scientific discovery and shed light on the future Development of AI for Science.

In this blog, we aim to continue our growth as a community, summarizing experience and progress and new lessons and inspiration (especially those transferable across fields) for the future. Specifically, we cover various areas from Chemistry, Biology, Computer Science/Mathematical Science, Physics, Earth Science, and Neuroscience to Medicine.

Disclaimer: This blog is not exhaustive of all advances the AI for Science community made in 2023. Instead, our primary goal is to highlight some notable developments or occurrences within AI for Science over the past year. These selections, made by our team members, are admittedly subjective. Additionally, it’s important to note that our selection of featured articles does not imply our agreement with every statement within them.

AI for Science in 2023 imagined by DALL⋅E (the image is imperfect in spelling wrong many words).

Table of Content

  1. Why AI for Science?
  2. Takeaways
  3. Chemistry
  4. Biology
  5. Computer and Mathematical Science
  6. Physics
  7. Earth Science
  8. Neuroscience
  9. Medicine
  10. Future Outlooks and Challenges
  11. Stay Updated
  12. Acknowledgements

Why AI for Science?

Before diving into the AI for Science book of 2023, we aim to address a question that many may ponder: Why establish a broad-based community under the umbrella of “AI for Science” instead of focusing on narrower domains like “AI for Drug Discovery” or “AI for Structural Biology”? In response, we underscore several key reasons:

  • Synergy across fields: The AI for Science initiative fosters synergistic relationships not only between AI and various scientific disciplines but also among different subfields within AI and science. This cross-disciplinary interaction catalyzes innovative solutions and accelerates scientific discovery.
  • Hierarchical organization of knowledge: Analogous to managing levels of subjects, AI for Science represents a higher-order domain that encompasses and transcends specialized fields. It provides a macroscopic view, connecting and contextualizing the more focused applications of AI in specific scientific areas.
  • The value in studying science: Beyond addressing particular problems, there is intrinsic value in understanding and exploring science as a whole. AI for Science champions this holistic approach, promoting a deeper comprehension of nature and the universal principles that govern it.
  • Addressing community challenges: AI for Science is uniquely positioned to tackle broad, community-wide challenges that transcend individual disciplines. By uniting a diverse range of perspectives and expertise, the community can confront not only scientific problems, but also systematic challenges in the community such as diversity, resources, ethics, education, etc.
  • Collaborative opportunities: This inclusive platform offers unique opportunities for collaboration. It brings together diverse experts facing shared challenges and methodologies, fostering a community where collective wisdom can address complex issues more effectively than isolated efforts.

Takeaways

If you do not have time to read the entire blog, here are some overarching trends across fields:

  • LLMs are impacting all fields. LLMs change how humans interact with machines and demonstrate the impact on various fields, from planning experiments in chemistry and biology, searching for better algorithms in computer science to acting as generalist AI agents in medicine (1, 2).
  • Autonomous laboratories for data generation and experiment. Over the past year, the integration of AI for experimental planning and the use of robotics for automated execution has marked a substantial advancement (1, 2, 3), bridging experimental synthesis and validation, traditionally a major bottleneck. Though still in the early stages of development, these initiatives demonstrate the potential to not only test AI planning algorithms but also to significantly improve data generation quality and quantity. This, in turn, accelerates experimental validation and helps to complete the cycle of discovery.
  • Geometric deep learning research still grows. Geometric deep learning is rapidly evolving in the field of science, where it addresses the widespread presence of structured data. Its development includes designing equivariant neural networks that adhere to complex symmetries found in quantum physics of molecular systems (1, 2), and deciphering multi-scale, multi-modal biological data characterized by diverse and intricate relationships (1, 2, 3, 4). This approach is enhancing our understanding of data across various scientific domains.
  • Generative models for design. Generative models, including but not limited to diffusion models, have achieved success in designing new functional proteins (1), capturing transition structures in chemical reactions (1), reconstructing images from brain activity (1) to sampling field configurations in quantum chromodynamics (1).
  • Development of foundation models. General-purpose pre-training followed by fine-tuning in downstream tasks, i.e., the foundation model approach, is getting more popular in science domains. Researchers have been trying to develop “foundation models” specialized for atomic force fields (1, 2, 3), and biological systems (1, 2, 3, 4, 5).
  • Decision making attracts more attention. From discovering new combinatorial algorithms to proposing new experiments, decision making becomes another popular paradigm that has been largely studied and adopted in AI for Science (1, 2).
  • Focused discovery areas. The advancements in AI for science are predominantly observed in the fields of Chemistry, Biology, and Medicine. This trend might be partly due to the high stakes these areas represent, such as in drug discovery, which often garner more public attention and publicity.
  • Big techs are pushing the boundary. In the past year, there’s been a notable surge in interest from major technology companies like Microsoft, Google Research and DeepMind, Meta, and Nvidia in the area of AI for Science. Their sophisticated organizational structures and superior computational capabilities are increasingly influential in driving advancements in various scientific fields that leverage AI.
  • Open vs. closed science. Unfortunately, it is observed that less and less work decides to be open-sourced in the area of AI for Science, even in academia.

Chemistry

Automated chemical and materials synthesis

The amount and noise level of data required for training and sampling in AI models, and the need to validate abundant candidates generated by AI algorithms necessitate the automated robotic execution of the experiments, eventually closing the discovery loop. In 2023, we witnessed exciting progress in this direction toward bridging the “execution gap” between the AI and the physical world. Koscher et al. demonstrated this idea using dye molecules, combining predictive models and generative models for the design and synthesis of small organic molecules. Over 3000 reactions were executed, and discovered 303 unreported dye-like molecules. Szymanski et al. explored similar ideas in inorganic material synthesis and built an autonomous lab called A-Lab. While it may be an overstatement to claim that it discovered new materials, the project was notably successful in synthesizing 41 compounds out of 58 trials, guided by synthesis plans generated by a language model. Strieth-Kalthoff et al., demonstrated this idea in another way. Instead of building an autonomous lab to conduct everything locally, they presented a cloud-based approach that enabled a delocalized and asynchronous design campaign, which led to 21 new laser-emitting materials being discovered. Additionally, an interesting application of discovering oxygen-producing catalysts from Martian meteorites using autonomous experiments was also reported by Zhu et al. We additionally note that there are still debates and criticisms of A-Lab (e.g. here) from material scientists on whether A-Lab has discovered any new material that has not been synthesized before. While these initiatives are still in their proof-of-concept stages with relatively straightforward integrations, they represent significant progress in harmonizing AI capabilities with practical scientific discovery.

Autonomous molecular discovery framework via iterative design-make-test-analyze (DMTA) cycles that merges several ML tools to iteratively plan experiments to learn a structure-property space defined by molecular dye scaffolds. This figure is obtained from the cover figure of Koscher et al.

LLM agents for chemical research

With ChatGPT being popularized, large language models (LLMs) are at the center of machine learning research in 2023. Besides their great potential in human-bot interaction, LLMs are also promising in automating scientific discovery through their interactions with the internet, models, and equipment in experiments. The essence of this idea is to maximize the benefit of using natural language as the unified I/O with the increasing capability of LLMs. Two concurrent works built around this idea are Coscientist and ChemCrow. In both studies, a central LLM ``planner’’ is established on planning experiments with a given objective (e.g., synthesizing ibuprofen), capable of calling other LLM modules for many functionalities, such as searching the internet, writing and executing Python code, searching documentation of experiment equipment, and running experiments either automatically or manually. Though relatively modest and proof-of-concept in nature, these demonstrations of LLM agents did highlight the versatility and display a capacity to synthesize known compounds, search and navigate hardware documentation, execute high-level commands in a robotic lab, control liquid handling instruments, and solve optimization problems by analyzing previously collected data (e.g. Coscientist). As this approach matures, human scientists will be liberated from routine, labor-intensive tasks, and instead focus on planning and reasoning. For example, a novel chromophore is discovered in ChemCrow, but not possible without human feedback being an important ingredient. Both automation of experiments and the development of LLM agents are quickly advancing in 2023, leading to dreams of unleashing the potential of integrating automated labs and AI for discovering new chemistry that has not yet been learned with humans.

Architecture of Coscientist including different types of modules for information exchange. The colors for corresponding modules are as follows. Input prompt: Red. LLM modules: Blue. Planner module: Green. Modules not using LLMs: White. This figure is obtained from Figure 1a and 1b in Boiko et al.

Machine learning interatomic potential aided materials discovery

Efficiently conducting large-scale simulations at the atomistic level and accurately capturing the electron interactions are crucial yet challenging tasks in material discovery. Classical force fields, while instrumental in advancing these studies, often fall short in accuracy. Therefore, there is a pressing need for an accurate force field to improve the fidelity and reliability of ab initio molecular dynamics (AIMD) for materials.
Powered by deep learning tools, a novel approach to atomistic modeling named Crystal Hamiltonian Graph Neural Network (CHGNet) was introduced by Deng et al. in 2023. By pretraining on 1.5 million inorganic structures from the Materials Project Trajectory Dataset, CHGNet is able to predict the energies, forces, stresses, and magnetic moments from a crystal structure with an unknown atomic charge.
More encouraging results for machine-learning interatomic potential (MLIP) to boost material discovery come from the DeepMind GNoME team. By pretraining an accurate and general machine learning interatomic potential for bulk solids, GNoME is able to scale up the predictions of materials to 2.2 million, including 380,000 promising candidates for experimental synthesis.

The capabilities of MLIP have been impressively expanded to biomolecular applications. Musaelian et al. have advanced the scope of AIMD simulation size to a 44-million atom structure of a fully detailed, all-atom, explicitly solvated HIV capsid at quantum-level fidelity by employing a deep equivariant neural networks, named Allegro. Besides models designed and trained on a particular domain of chemistry or materials, there occur efforts in 2023 to develop large and general atomic models, pretrained with many atomic datasets that cover as many domains of molecules and materials as possible, with the goal for their efficient fine-tuning and distillation to downstream applications, especially to data-scarce scenarios (e.g. DPA-2,Meta-LAM,MACEFoundation).

Diffusion models for generating transition states in chemical reactions

Chemistry is all about understanding and controlling chemical reactions, i.e., how substances interact and change from one to another.
Unlike reactants and products which are long-lived substances in experiments, transition states are fleeting substances lying at the highest energy point on reaction pathways, and are thus extremely difficult to be isolated and characterized in experiments. Conventionally, people used quantum chemistry as a complement to search for transition state structures. However, those calculations are expensive (hours or days) and suffer from low convergence rates.

A group of researchers from MIT and Cornell developed OA-ReactDiff, a diffusion model that describes 3D reactants, transition states, and products as a joint distribution. OA-ReactDiff satisfies all the symmetries in chemical reactions, leading to superior performance compared to prior machine learning models. With OA-ReactDiff, an accurate transition state structure can be generated in six seconds, achieving a 1000-fold acceleration compared to established optimization-based methods (such as string method and nudged elastic band) in a computational workflow for reaction exploration.Conventionally, people rely on human intuition when exploring possible chemical reactions, even at the stage of high throughput computation.OA-ReactDiff, due to its stochastic nature of diffusion model, however, provides a complimentary solution by being able to explore unintended reaction pathways that would otherwise be neglected during chemical intuition based reaction exploration.

Some future applications of generative modeling for chemical reactions include modeling complex combustion reactions, exploring reactions that may have occurred during the early evolution of life on Earth, and designing new catalysts. Besides transition state generation, some exciting applications of diffusion models were released as preprint at the end of 2023, such as generating solid-state materials with property guidance (e.g. MatterGen), metal-organic frameworks (e.g. MOFDiff), and thermal distributions of structures (e.g. DiG).

Overview of equivariant diffusion models for generative molecular system sampling for (a) single molecule and (b) a chemical reaction containing three structures (i.e., reactant, transition state, and product). This figure is obtained from Duan et al.

Biology

Protein modeling and design

Since AlphaFold2 opened up the research avenue to study protein structures in 2021. Increasing efforts have been made to understand protein interactions. Among them, RosettaFold-AA, AlphaFold-latest expand beyond protein structure prediction to predicting protein interactions with other biomolecules such as small molecules, proteins, nucleic acids, etc. Another line of work studies the conformational space of proteins, Hannah et al. discover that by clustering the multiple-sequence-alignment (MSA) using sequence similarity enables AlphaFold2 to sample alternative protein conformations.

Overview of RoseTTAFold ALL-Atom which extends previous protein structure prediction task to a variety of interaction prediction module. Figure is taken from the paper (RoseTTAFold ALL-Atom).

Complement to understanding the structure and function of proteins, protein design focuses on creating new proteins or modifying existing ones to achieve specific structures and functions. In 2023, two major progress have been made in improving protein design with AI. Following the progress made in the field of Geometric Deep Learning and Generative AI, especially diffusion models, RFDiffusion and Chroma devise diffusion models respecting symmetries in Euclidean space (rotation, translation and reflection) to generate new proteins. In addition to de novo design, they also propose techniques for flexible design and optimization of proteins such as conditioning on binding target, functional motif, and optimizing structures or functions based on a model to provide heuristics (i.e., gradients).

Overview of RFDiffusion model, it learns a denoising process from Gaussian noise to protein structures. Figure is taken from the paper (RFDiffusion).

Foundation models for biology

The “foundation model” paradigm has demonstrated its effectiveness in natural language processing and vision. The natural question to ask is whether this paradigm can be effectively realized in biology. Indeed, in the past, most machine learning methods in biology are task-specific, and models are trained from scratch. Are there foundational and transferrable information that are generalized across tasks? How do we develop models for them?

In the past year, there have been many efforts to make foundation models for various biological modalities. In 2023, we have seen updates on the protein language model and its numerous applications, including protein folding, cross-species cell embedding (UCE), just to name a few. Beyond proteins, we see similar self-supervised learning ideas applied to DNA (1, 2). They are shown to be effective in standard DNA sequence modeling benchmarks. The special challenge of DNA is the massive sequence length, where it easily goes beyond the capacity of the transformer model. HyenaDNA inherits the Hyena framework and extends it to the DNA sequence for long-context foundation models. In addition to DNA, the RNA foundation model has also emerged. The idea of ATOM-1 is to train the seq2seq model on chemical mapping data of RNA sequence. They have demonstrated its initial utility in structure prediction.

Moving to cell biology, in 2023, we see a large number of single cell foundation models. Cui et al., Hao et al., and Theodoris et al. extend LLM pre-training objective to single-cell gene expressions. They are benchmarked on standard single-cell analysis tasks such as batch effect correction and integration. Biology-inspired pre-training objectives have shown the much stronger ability to capture biological signals, and its zero-shot ability has performance on par with finetuned models on integration tasks by Rosen et al.

Lastly, foundation models in biology have predominantly been unimodal (focused on proteins, molecules, diseases, etc.), primarily due to the scarcity of paired data. Bridging modalities to answer multi-modal queries is an exciting frontier. BioBridge leverages biological knowledge graphs to learn transformations across unimodal foundation models, enabling multi-modal behaviors.

While there is excitement around these models, we have not observed a foundation model that can consistently outperform across tasks.

Illustration of foundation model for single-cell genomics and how it enables various downstream tasks. Figure is taken from scGPT.
UCE (Yanay et al.) is a single cell foundation model with zero-shot ability. Figure is taken from the paper (UCE).

LLMs for biology

Biology, in its natural format, is of biochemical entities such as DNA, RNA, and proteins, which have different inductive biases than human language. However, the exploration and understanding of biology heavily rely on human language. For example, scientists write, communicate, and come up with ideas that are described using human language. Experiments are conducted following text descriptions. In 2023, with the advent of powerful LLMs, we have seen the potential for LLMs to revolutionize biological science by introducing novel abilities to study biology and even discover new biology.

Microsoft Research AI4Science team conducted a comprehensive study on LLM in biology. It first benchmarked LLMs across numerous biological tasks such as sequence annotation, identifying functional domains in proteins, signaling pathways, and designing sequences, although with results that are suboptimal with the task-specific model. They also show LLM’s ability to help experiment design by processing data and producing code for liquid handling robots.
Another example of accessing LLM internal knowledge is MedPaLM-2, which utilizes the knowledge contained in LLM about each gene to prioritize genes. Another use case of MedPaLM-2 in clinical medicine is detailed in the Medicine section.

Another exciting thread is the LLM agent framework for autonomous discovery. WikiCrow from Future House is the first of its kind to go through vast public literature and synthesize cited Wikipedia-style summaries to generate draft articles for the 15,616 human protein-coding genes that currently lack Wikipedia articles. LLM agents can serve as a powerful information extractor from sources that surpass the ability and scale of human manual curations. We expect to see more of the agent’s ability in biology, as a research assistant, as a reasoning machine, and more.

Wikicrow LLM agent framework to question answering in biology. Figure is taken from the introduction page.

Graph AI for biology

Biology is an interconnected, multi-scale, and multi-modal system. Effective modeling of this system can not only unravel fundamental biological questions but also significantly impact therapeutic discovery. The most natural data format for encapsulating this system is a relational database or a heterogeneous graph. This graph stores data from decades of wet lab experiments across various biological modalities, scaling up to billions of data points.

In 2023, we witnessed a range of innovative applications using GNNs on these biological system graphs. These applications have unlocked new biomedical capabilities and answered critical biological queries.

One particularly exciting field is perturbative biology. Understanding the outcomes of perturbations can lead to advancements in cell reprogramming, target discovery, and synthetic lethality, among others. In 2023, GEARS applied GNN to gene perturbational relational graphs, and it predicts outcomes of genetic perturbations that have not been observed before.

Another notable application is to contextualize protein representation. While current protein representations are fixed and static, we recognize that the same protein can exhibit different functions in varying cellular contexts. PINNACLE uses GNN on protein interaction networks to contextualize protein embeddings. This approach has been shown to enhance 3D structure-based protein representations and outperform existing context-free models in identifying therapeutic targets.

Moving beyond predictions, understanding the underlying mechanisms of biological phenomena is crucial. Graph XAI applied to system graphs is a natural fit for identifying mechanistic pathways. TxGNN, for example, grounds drug-disease relation predictions in the biological system graph, generating multi-hop interpretable paths. These paths rationalize the potential of a drug in treating a specific disease. TxGNN designed visualizations for these interpretations and conducted user studies, proving their decision-making effectiveness for clinicians and biomedical scientists.

GEARS is a graph neural network model to predict perturbation transcriptional outcomes. Figure is taken from the paper (GEARS).

Computer and Mathematical Science

LLMs for discovering new algorithms

With the advent of LLMs, one promising direction proposed by the research community is to utilize it to discover new theory and mathematics. A recent study (FunSearch) from DeepMind demonstrates the potential of LLMs to be used to discover new programs that solve hard combinatorial problems. The main goal of FunSearch is to find better programs (defined by an evaluator) to solve a hard problem. Specifically, it takes an iterative and evolutionary process between a pre-trained LLM and an evaluator. Given a pool of programs, the evolutionary algorithm selects the best program candidates from the pool to be prompted into LLMs for improvement. Then, the revised programs are evaluated, scored, and put back in the pool. During this evolutionary process, better and new programs are proposed. They validate the effectiveness of FunSearch on two combinatorial problems — — cap set and online bin packing problem and FunSearch finds better solutions than best known ones.

FunSearch has an evolutionary process to iteratively improve the propose program with pre-trained LLMs. Figure is taken from the paper (FunSearch).

Machine learning for new algorithm discovery

Designing new fundamental algorithms such as sorting or hashing has been challenging and significant in the field of computer science.
In 2023, DeepMind presented AlphaDev, a significant advancement in AI’s capability to optimize computational algorithms.

Utilizing deep reinforcement learning, AlphaDev has successfully discovered novel, more efficient sorting algorithms that outperform traditional methods developed by humans by starting from scratch instead of refining existing algorithms.

AlphaDev has recently uncovered novel methods that exhibit a 70% increase in speed for shorter sequences and around 1.7% improvement for sequences over 250,000 elements, as compared to the techniques utilized in the LLVM libc++ sorting library. Although AlphaDev has drawn a lot of attention, it is worth mentioning the ongoing skepticism about the significance of AlphaDev and the possibility of researchers guiding GPT-4 to “discover” the same sorting algorithms without reinforcement learning through effective prompt engineering. Despite the ongoing arguments and issues surrounding AlphaDev, it inspires us some potential paths to create more efficient algorithms and a more robust and enduring computing ecosystem.

Physics

Machine learning for neutrino physics discovery

The paper, authored by the IceCube Collaboration, represents a landmark study in astrophysics, harnessing the power of machine learning to analyze data from the IceCube Neutrino Observatory. Machine learning algorithms are employed to discriminate between signal and background data, enabling the detection of high-energy neutrino emissions from the Galactic plane with unprecedented precision.

The study utilizes convolutional neural networks (CNNs) to perform event selections, which high inference speed (milliseconds per event) enables a more complex filtering strategy for event selection pipeline. To determine the properties of the physical events, a novel hybrid reconstruction method that combines a maximum likelihood estimation with neural network of multiple symmetries is utilized. The machine learning models are refined through a decade of observational data, learning to pinpoint the characteristics of neutrinos amidst a backdrop of cosmic noise. The findings, which reveal neutrino emissions with 4.5-sigma significance, underscore the potential sources within the Milky Way. The innovative use of machine learning in this context not only enhances the observatory’s detection capabilities but also offers a model for future astrophysical explorations. This research showcases the evolving role of machine learning as an indispensable tool in the quest to unravel the mysteries of high-energy cosmic phenomena.

Generative models for lattice QCD sampling

Lattice Quantum Chromodynamics (QCD) is a vital theoretical framework in physics, used to study the strong interactions that bind quarks and gluons into protons, neutrons, and other hadrons. Lattice QCD, while powerful, faces significant computational challenges due to the complexity of quantum field calculations on the lattice. Conventional methods like Monte Carlo simulations, although widely used, require substantial computational resources and can be inefficient for certain calculations due to critical slowing-down and topological freezing. Over the past few years, machine learning-based sampling schemes have been proposed to potentially revolutionize these calculations, offering a potentially more efficient approach compared to traditional methods. In 2023, the community has reached to the point to construct generative models with complex non-abelian gauge symmetries and fermions, paving the way for large-scale QCD simulations. It also provides a roadmap for future development in this field, suggesting that ML techniques could significantly advance lattice field theory studies and have broader implications in physics and computational science.

Comparison between Flow-based sampling and conventional Hybrid Monte Carlo. This figure is obtained from Figure 4 in Cranmer et al.

Earth Science

AI-Powered weather forecasting

Weather forecasting has traditionally been a challenging task due to the non-linear dynamics and complexity of atmospheric conditions. In 2023, significant advancements were made in short-range (several hours) and medium-range (several days to weeks) weather forecasting using AI. These advancements include ClimaX, GraphCast, Pangu-Weather, MetNet-3, PreDiff, all aiming for high-resolution spatial-temporal predictions using decades of historical weather datasets or results from numerical physical simulations.

GraphCast implements a graphical neural network with an “encoder-processor-decoder” configuration to process spatially structured weather data. PreDiff develops a conditional latent diffusion model for probabilistic forecasts. ClimaX stands out by providing both global-scale and regional-scale models using transformers, as well as a foundation model. This foundation model is trained to predict an arbitrary set of input variables at any future time, thereby generalizing to other tasks.

ClimaX is built as a foundation model for any weather and climate modeling task. This figure is obtained from Nguyen et al.

Most of this AI-powered weather forecasting work has been conducted within major tech companies. Training these large-scale, high-resolution models requires high-performance computing, but the inference process is comparatively lightweight. The datasets used, such as the ERA5 reanalysis archive from The European Centre for Medium-Range Weather Forecasts (ECMWF), and CMIP6 from The Coupled Model Intercomparison Project (CMIP), an international collaboration, require comprehensive data cleaning and curation efforts by multiple teams prior to the AI use.

With the increasing prevalence of extreme weather and ongoing climate change, understanding the Earth system’s response to climatic disturbances is crucial. This knowledge will provide the foundation for many areas of earth sciences, including hydrology (specifically hydrometeorology), oceanography, and, more practically, the development of response plans and early warning systems for natural hazards.

Differentiable models make it possible to break the model into portions to narrow the scope of the relationships (blue) to be learned (potentially with fewer data than training a pure ML model) for better interpretability. The green blocks are physical models. This figure is obtained from Shen et al.

AI-Powered surrogate models with physical constraints

The significant advancements in weather forecasting can be largely attributed to the availability of rich datasets and substantial computational resources. However, in many areas of Earth Sciences, extensive observation across various scales is not always feasible, particularly for subsurface structures, cryosphere, geobiology, and areas like volcanology where data is limited. In these fields, predictions often rely on numerical physical simulations or process-based models, which are computationally intensive. Recently, there has been a trend towards replacing these with lightweight surrogate models. These AI-based surrogate models offer faster predictions and support more efficient decision-making processes. A group of researchers from Caltech proposed Fourier Neural Operator \cite{li2020fourier}, which learns neural network weights and biases in the frequency domain, has been popular for Earth Sciences applications in 2023, including carbon capture and storage by Wen et al., groundwater flow and contaminant transport by Meray et al., and ocean circulation by Chattopadhyay et al. These applications are based on training data from numerical physical models; once the AI-based surrogate is trained, the computational burden significantly decreases.

Additionally, hybrid model, which combines both simulation data and physical constraints, has been more popular in Earth Sciences. One approach is incorporating physical constraints, including initial and boundary condition, domain knowledge and partial differential equation into the loss function, as seen in physics-informed neural networks. Several applications include ice shelf hardness inversion Iwasaki, Y. and Lai, C.Y., 2023 and hydrologic modeling by Meray et al. Another solution is to mix physical models with pure machine learning models, such as differential model in Shen et al. This model leverages both physical law and training machine learning model with existing data to learn a better earth system model through automatic differentiation.

Neuroscience

Using AI to reconstruct images from brain waves

Reconstructing visual experiences from human brain activities offers great opportunities for understanding how our brain represents the world. However, the task has remained extremely challenging since the relation seemed quite complicated and generating images is difficult already. Thanks to the creation of generative models, especially latent diffusion models (stable diffusion), one can generate realistic-looking images conditioned on texts in latent space. If we interpret brain activity data as “texts”, then we are able to generate images conditioned on brain activity data. Remarkably, the task does not require training any complicated neural network — one can use the pre-trained stable diffusion model, only training linear mappings from brain data to latent vector and context vector.

Presented images (red box, top row) and images reconstructed from fMRI signals (gray box, bottom row) for one subject (subj01). The figure is obtained from Takagi, Y. and Nishimoto, S., 2023.

Representation learning for joint behavioral and neural analysis

Understanding behavior from neural activity (and the other way around) is a fundamental goal of neuroscience. As we have increasing capability to record large neural and behavioral data, the key question is how to identify useful latent embeddings for both behavior and neural data. Popular methods in neuroscience, including UMAP and t-SNE, are unable to incorporate temporal information and are sensitive to small changes inherent to animals or in how data were collected. This Nature study attempted to solve this problem. They proposed a CEBRA algorithm that learns consistent and informative embeddings based on contrastive learning. Their method can adapt to both hypothesis-driven and discovery-driven analyses. In particular, they demonstrate consistencies of embeddings across many sessions, animals, and modalities. The method will become a complement to (or replacement for) previous methods such that, at minimum, the structure of time in the neural code is leveraged, and robustness is prioritized.

The pipeline of CEBRA. The figure is obtained from Schneider, S., Lee, J.H. and Mathis, M.W., 2023.

Medicine

Multi-modal foundation models for medicine

Following the progress of the large language model,
Moor et al. propose a generalist medical AI (GMAI), which can interpret multi-modal data such as imaging, electronic health records, laboratory results, genomics, graphs, or medical text. GMAI is pre-trained on large, diverse, multimodal data in a self-supervised manner and can conduct diverse medical applications.

Also, Singhal et al. curates a large-scale question-answer dataset in the medical domain and proposes a medical domain large language model based on PaLM (Google’s large language models), also known as Med-PaLM, which is the first AI model that exceeds the pass threshold (>60%) in the U.S. Medical Licensing Examination (USMLE). After a couple of months, the same group of authors proposed the second version of Med-PaLM (Med-PaLM 2). As shown in the figure below, Med-PaLM 2 achieved a noteworthy milestone (86.5% versus 67.2% (Med-PaLM)) by becoming the first to attain a level of proficiency comparable to that of human experts in responding to USMLE-style questions. Physicians noted a significant enhancement in the model’s long-form answers to consumer medical queries.

Labelled image-text pairs also present an exciting opportunity for multimodal learning in medical imaging. For example, a pathology language–image foundation model was developed with contrastive learning on image-text pairs shared by clinicians on public forums such as medical Twitter by Huang et al.

Left: Performance of various models on MedQA: Med-PaLM is the first AI model that passes the threshold; Med-PaLM 2 achieves significant leap; Right: Med-PaLM 2 outperforms human physicians in 8 of 9 aspects when answering 1066 consumer medical questions.
The figure is copied from Singhal et al.

Public Health: predicting viral escape from prepandemic data

AI for medicine has also been developed for applications in public health with the development of predictive viral evolution models. Current predictive viral evolution models lack effectiveness early in a pandemic due to two factors: (1) experimental approaches require the use of host polyclonal antibodies for testing, and (2) existing computational methods heavily rely on the prevailing strain prevalence to make reliable predictions of concerning variants. To address this challenge, Thadani et al. introduced EVEscape, a deep generative model (variational autoencoder, i.e. VAE) that is trained on historical viral sequences with biophysical and structural constraints. EVEscape assesses the viral escape potential of mutations on a large scale and offers the advantage of applicability before surveillance sequencing, experimental scans, or the availability of three-dimensional structures of antibody complexes. Simulations show that EVEscape, trained on sequences predating 2020, exhibits accuracy comparable to high-throughput experimental scans in predicting pandemic variations for SARS-CoV-2 and can be generalized to other viruses, including influenza, HIV, and less-studied viruses with pandemic potential, such as Lassa and Nipah.

EVEscape (Figure taken from the paper) evaluates the probability of a mutation evading the immune response by considering the likelihood of the mutation preserving viral fitness, occurring within an antibody epitope, and disrupting antibody binding. (b): EVEscape relies only on information accessible in the early stages of a pandemic, before the widespread availability of surveillance sequencing, antibody-antigen structures, or experimental mutational scans. This allows for additional time crucial for vaccine development.

Image models in medicine

The development of classifiers for medical images was popularised with benchmarks for predicting the presence of tumors from radiological images. However, these have been largely ineffective for clinical application analyzed by Varoquaux, G. and Cheplygina, V., 2022 due to shortcut learning observed in neural networks by DeGrave, A.J., Janizek, J.D. and Lee, S.I., 2021. Beyond the development of classifiers, there are new model capabilities being explored for medical images:

Generating counterfactuals for explainable medical image classifiers

In dermatology, DeGrave et al. have aimed to address the failures of learning shortcuts in previous classifier models by leveraging generative models to create counterfactual images that alter the prediction of a medical AI classifier. They gained insight into the medical AI classifier’s reasoning processes through the analysis of generated counterfactuals by dermatologists. This contributes to efforts to make the powerful inference processes of machine-learning models more medically understandable.

Segment-Anything Model (SAM) in medical image segmentation

One noteworthy breakthrough in general computer vision is the Segment-Anything Model (SAM), which builds the largest segmentation dataset to date, with over 1 billion masks on 11M licensed and privacy-respecting images and exhibits superior image segmentation ability in zero-shot tasks. Several follow-up models build upon SAM (e.g., fine-tuning SAM or transfer learning based on SAM) and achieve state-of-the-art performance on medical image segmentation (1, 2).

Practical guidelines in regulation of medicine AI

Government regulation plays a crucial role in the development of AI in medicine. Regulations ensure that AI applications in medicine meet strict safety and efficacy standards. This is essential to protect patients from potential harm and to guarantee that medical AI tools provide accurate and reliable results. The advancement of AI technology poses challenges to its regulation, prompting policymakers worldwide to respond with careful consideration. In 2023, the U.S. Food and Drug Administration (FDA) released the “Content of Premarket Submissions for Device Software Functions”, aiming to provide information on recommended pre-market submission documents for evaluating the safety and effectiveness of software device functions, including those utilizing machine learning models trained through machine learning methods. The FDA’s regulatory policies have evolved from initial proposals into practical guidelines.

Future Outlooks and Challenges

  • Open science is a key idea to accelerate scientific discovery with reproducible and accessible research progress. Witnessing the trend of the community being more exclusive in 2023, we advocate for open science in the AI for Science community on the way to the future.
  • Many of the AI for Science fields are in the transition from proof-of-concept to landing, transforming our theoretical knowledge into reliable tools in scientific discovery is a major challenge for the community and necessitates not only conceptual but also engineering and educational efforts. It may take longer than LLMs to make an impact on our daily life. However, it is a necessary step to deepen and broaden scientific discovery.
  • Solving grand challenges in science requires knowledge across multiple dimensions, and building a collaborative environment is critical to advance both AI and Science research.
  • Since the advent of AI for Science, there have been numerous debates between physics-driven and data-driven approaches. The balance between data-driven and physics-driven mindsets is a key perspective in making AI shine in scientific discovery.
  • With the expanding of the community, the risk for misuse of AI for Science tools continues to grow. While developing new algorithmic advancements, the community should keep ethics and safety in mind.
  • Impacts of AI on sustainability and social science take longer to be realized in real life, and progress and conclusion are made relatively slower.

Stay Updated

Acknowledgements

This blog was mainly co-authored by several organizers of the AI for Science workshop series supported by senior members including Sherry Lixue Cheng, Yuanqi Du, Chenru Duan, Ada Fang, Tianfan Fu, Wenhao Gao, Kexin Huang, Ziming Liu, Di Luo, Lijing Wang (ordered alphabetically).

--

--

AI for Science
AI for Science

No responses yet