Foundation models (FM)

Parag Mahajani
3 min readFeb 12, 2024

--

Sci-Tech Neologisms-5.

Based on the epic paper “On the Opportunities and Risks of Foundation Models” by CRFM, HAI, Stanford University, and other resources.

Courtesy: Bard

Parag Mahajani.

“Technology is neither good nor bad, nor is it neutral.” — Prof. Melvin Kranzberg (1986)

What are foundation models?

  • A general class of models (domain agnostic) based on deep neural networks and self-supervised learning.
  • The models are trained on a huge amount of raw data, especially unlabeled datasets.
  • The models belong to a general paradigm of AI. They can apply information learned from one situation to a completely different situation (for humans, it is like driving cars of different makes and sizes).
  • The models are based on transformer neural network architecture.
  • The models that are enabled by “transfer learning” (instead of “pretraining,” which is most widely used in deep learning).
  • The models’ significance is emergence and homogenization. (Emergence is the implicit behavior of the system, and homogenization is the consolidation of methodologies for the construction of machine learning systems across a wide range of applications.)
  • The models power the generative AI applications.
  • The market segment of foundation models may generate $11.4 billion in revenue by 2028 (S&P Global).

Why are they used?

  • Innate power (due to the massive scale of training data).
  • Excellent performers in natural language processing (NLP)
  • Can perform a wide variety of jobs, including text generation, text translation, image analysis, image creation from the text, video creation from the text, and audio creation from the text.
  • Apply techniques like retrieval-augmented generation (RAG). This increases the accuracy and reliability of tapping external resources like corporate data.
  • Used in major cloud services for various service augmentations like enhancing SEO, browsing, and chatbot experiences.
  • Efficient in creating conversational interfaces like chatbots and powerful PAs.
  • Can automate the code and help create digital twins.
  • Can be used in life sciences for various applications like generating new proteins, deploying customized biomolecular models, and R&D.
  • Can help generate the 3D evolution of the internet.
  • Availability of many open-source models for use (ex. BLOOM).
  • Evolving potential applications in all the domains.

What are today’s popular foundation models?

GPT-3, GPT-4, Bard, Dall.E-2, LLaMA, Imagen, Florence, Muse, MedLM, Codey, Falcon, BLOOM, and Chirp.

What are their current challenges?

  • Many of the characteristics of the models are poorly understood.
  • Hallucinations: Businesses should provide more accurate training data to avoid inaccurate and fictional responses
  • Lack of transparency
  • Black box vulnerability due to “closed” design of large-scale neural networks
  • Requirements of substantial computational resources for processing the colossal amount of data.
  • Environmental hazards; consumption of a very large amount of power
  • Output may violate the intellectual property rights of existing works.
  • Data privacy issues
  • Based on human judgment regarding the biases in the training data
  • Prompt engineering is required to improve the accuracy, exercise some degree of control over the output, and mitigate bias.
  • Most complex and larger than other models of AI

Disclaimer: The statements and opinions expressed in this blog are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

--

--

Parag Mahajani

Sci-tech communicator, author, technical writer and public speaker of science and technology working for multinational corporates for more than 30 years.