Demystifying ChatGPT for you— Part 1

Sharad Varshney
4 min readJun 2, 2023

--

In a clear retrospect, November 2022 is the month when it became very clear as an inflection point in terms of where AI was before this time and how it will be seen post this period. As everything has changed including world perception of AI since Nov. 2022. So what happened — release of ChatGPT by OpenAI which ironically is not that open or open source I may add.

Here is the OpenAI definition of ChatGPT — ChatGPT which interacts in a conversational way to any humans or machines and the dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests, mainly create a conversational bot who builds up a context of who is taking to the bot. But using open AI Chatbot comes with a cost and bigger issue is sending all the sensitive, PI, PII and confidential company information to OpenAI which can be a big NO-NO for most if not all.

So here we are but this time we are not integrating with OpenAI apis to enable a ChatGPT like interface but I would like to take you through a journey of creating our own open source conversational chatbot or open source Large Language Model ( LLM) which based on GPT-2 transformer architecture and fine tune this model so it can answer any and all questions related to specific domain and we do all this without even have to sending any data outside the firewalls. Wait you gotta ask the question what does GPT really mean — Generative Pre-trained Transformer.

The main and most basic building block of GPT based LLMs is a transformer. What is a transformer? How does transformer architecture builds encoder and decoder structures without even using any recurrence or convolutions? Answer lies in a paper published in 2017 — “Attention is all you need”. I will not go in too much details of this paper but this paper introduced a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

Transformer architecture

So what do we mean by Encoder and what section of this model architecture is encoder? Encoder part of the network translates raw features into latent vector representations in sub-dimensional space which can later be used to decode the same information or various use cases such as text translation, summarization. Left part of the architecture represents the encoder network and consists of a stack of 6 identical layers and every one of that layer also consists of 2 sub layers of a) multi head self attention layer and b) feed forward network

Now lets get into Decoder part of the network, it works by decoding latent vector representation or embeddings into final sequence by generating 1 element of the output sequence at a time. Decoder network also consists of 6 identical layers and every one of those layers consists of 3 sub layers. In addition to 2 sub layers of encoder network, there is 1 additional sub-layer c) a masked multi head attention which received previous output of the decoder stack, augments it with positional information and implements multi head sub-attention, masking some of the information for later sub-layers. While the encoder is designed to attend to all words in the input sequence regardless of their position in the sequence, the decoder is modified to attend only to the preceding words.

Hence, the prediction for a word at any position can only depend on the known outputs for the words that come before it in the sequence. If you want to understand the maths behind the encoder and decoder layers and masking on self attention layer, I would recommend to read this article.

Now when we know about Transformer architecture, we should be able to explain what really GPT is doing to certain speculation which is a Pre-trained transformer based large language model who acquires a capability to even generate the text or output which was not previously shown to this model and that definition in my understanding is a very clear characteristic of an Artificially inseminated intelligent system vis-à-vis ‘AI’. This is the first time in human history from early 70s till Nov. 2022 that AI true definition has been proven through the actions of any trained GPT based LLMs. Let’s understand how LLMs gets trained in Part — II of this article which I will be writing very soon. So when the time comes to fine-tune and use Transfer Learning, we can be sure how to prepare our own domain dataset and kick off the fine tuning.

If you like this article or learned something new, please do leave a comment or a like.

--

--