Foundation models (FM)

3 min readFeb 12, 2024

Sci-Tech Neologisms-5.

Based on the epic paper “On the Opportunities and Risks of Foundation Models” by CRFM, HAI, Stanford University, and other resources.

Parag Mahajani.

“Technology is neither good nor bad, nor is it neutral.” — Prof. Melvin Kranzberg (1986)

What are foundation models?

A general class of models (domain agnostic) based on deep neural networks and self-supervised learning.
The models are trained on a huge amount of raw data, especially unlabeled datasets.
The models belong to a general paradigm of AI. They can apply information learned from one situation to a completely different situation (for humans, it is like driving cars of different makes and sizes).
The models are based on transformer neural network architecture.
The models that are enabled by “transfer learning” (instead of “pretraining,” which is most widely used in deep learning).
The models’ significance is emergence and homogenization. (Emergence is the implicit behavior of the system, and homogenization is the consolidation of methodologies for the construction of machine learning systems across a wide range of applications.)
The models power the generative AI applications.
The market segment of foundation models may generate $11.4 billion in revenue by 2028 (S&P Global).

Why are they used?

Innate power (due to the massive scale of training data).
Excellent performers in natural language processing (NLP)
Can perform a wide variety of jobs, including text generation, text translation, image analysis, image creation from the text, video creation from the text, and audio creation from the text.
Apply techniques like retrieval-augmented generation (RAG). This increases the accuracy and reliability of tapping external resources like corporate data.
Used in major cloud services for various service augmentations like enhancing SEO, browsing, and chatbot experiences.
Efficient in creating conversational interfaces like chatbots and powerful PAs.
Can automate the code and help create digital twins.
Can be used in life sciences for various applications like generating new proteins, deploying customized biomolecular models, and R&D.
Can help generate the 3D evolution of the internet.
Availability of many open-source models for use (ex. BLOOM).
Evolving potential applications in all the domains.

What are today’s popular foundation models?

GPT-3, GPT-4, Bard, Dall.E-2, LLaMA, Imagen, Florence, Muse, MedLM, Codey, Falcon, BLOOM, and Chirp.

What are their current challenges?

Many of the characteristics of the models are poorly understood.
Hallucinations: Businesses should provide more accurate training data to avoid inaccurate and fictional responses
Lack of transparency
Black box vulnerability due to “closed” design of large-scale neural networks
Requirements of substantial computational resources for processing the colossal amount of data.
Environmental hazards; consumption of a very large amount of power
Output may violate the intellectual property rights of existing works.
Data privacy issues
Based on human judgment regarding the biases in the training data
Prompt engineering is required to improve the accuracy, exercise some degree of control over the output, and mitigate bias.
Most complex and larger than other models of AI

Deep dive

Disclaimer: The statements and opinions expressed in this blog are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Foundation models (FM)

Written by Parag Mahajani