Historical background and development of LLMs

An Info-graphical representation

Imteyaz Ahmad
2 min readNov 14, 2023
Image generated using AI

Introduction

Definition of large language models (LLMs)

Large language models are a type of machine learning model that has been trained on a massive amount of data to understand the human language and generate text/responses to the user input. These models are a subset of natural language processing (NLP) models and are designed to perform various language-related tasks, such as text generation, translation, sentiment analysis, question answering, and more.

Historical background and development of LLMs

Historical background LLMs

The historical background and development of Large Language Models reflects the ever-evolving nature of NLP. From rule-based systems to statistical models and the resurgence of neural networks, the journey has been marked by transformative breakthroughs. While LLMs hold great promise for enhancing language understanding and generation, they also raise important ethical considerations that continue to shape the field. As we move forward, it is essential to strike a balance between harnessing the power of LLMs and addressing the ethical challenges they present in an increasingly AI-driven world.

References

Weaver, W. (1955). “Translation.” Machine translation of languages: Fourteen essays, 15–23.

Winograd, T. (1971). “Procedures as a Representation for Data in a Computer Program for Understanding Natural Language.” Communications of the ACM, 14(1), 36–45.

Jelinek, F., & Mercer, R. L. (1980). “Interpolated estimation of Markov source parameters from sparse data.” Proceedings of the Workshop on Pattern Recognition in Practice, 381–397.

Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). “Recurrent neural network based language model.” Proceedings of Interspeech, 1045–1048.

Mikolov, Tomas, et al. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013).

Vaswani, A., et al. (2017). “Attention is all you need.” Advances in neural information processing systems, 30.

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). “Improving language understanding by generative pretraining.” OpenAI Blog.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). “BERT: Bidirectional Encoder Representations from Transformers.” arXiv preprint arXiv:1810.04805.

OpenAI. (2019). “OpenAI’s approach to AI safety.”

OpenAI. (2019). “The Malicious Use of AI: Forecasting, Prevention, and Mitigation.”

Liu, Y.,et al. (2019). “RoBERTa: A robustly optimized BERT pretraining approach.” arXiv preprint arXiv:1907.11692.

Brown, T. B., et al. (2020). “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). “On the dangers of stochastic parrots: Can language models be too big?” arXiv preprint arXiv:2101.07061.

Chen, Mike, et al. “PaLM: Scaling language modeling with pathways.” arXiv preprint arXiv:2201.08237 (2022).

--

--