Unraveling ChatGPT for Developers and Engineers

Published in

Engineering@Incorta

10 min readApr 27, 2023

As a Developer or SW Engineer, you’ve probably heard of ChatGPT, the cutting-edge AI language model from OpenAI. With its wide range of applications and impressive capabilities, it’s no wonder you’re intrigued. In this article, we’ll guide you through the essentials of ChatGPT, its underlying mechanisms, and how you can integrate it into your projects. By the end, you’ll have a solid understanding of ChatGPT, why it works, and how to get started.

Why should I care ?

Well, you might have came across heated debates online about how the AI would replace humans in the near future. That’s not the target of this article, we’re not going to discuss whether or not AI would replace us. Yet, we would be focusing more on how to stay competitive in the current wave of LLMs AI, like ChatGPT. Learn more about the possible opportunities and limitations.

Following are some examples in which you can benefit from the new wave of LLMs AI:

Automation: AI streamlines repetitive tasks, enabling focus on creativity and strategic decision-making.
Continuous learning: Adapt to AI advancements and continuously learn to remain valuable and competitive in the job market.
Specialization: AI has limitations; specializing in a domain or technology helps maintain in-demand skills.
New job opportunities: AI drives new job opportunities for skilled professionals in research, development, and implementation.
Innovation: AI opens the door for new innovative ways to solve current problems and re-imagine current traditional solutions

It’s better to arrive early at the game to stay competitive in a rapidly changing tech scene

What is ChatGPT?

Before we explain, what ChatGPT “IS”, Let’s first discuss what ChatGPT “IS NOT”. ChatGPT limitations:

Lack of real-time understanding
Inability to handle ambiguous queries
Relies on high quality training data
Has Potential biases
Short of human intuition, creativity, and context-awareness

ChatGPT in Layman’s terms:

It’s an assistant that has read and understood virtually all accessible online human knowledge. It possesses a remarkable ability to make connections and deduce conclusions from this vast pool of information. However, you still need to take the driver’s seat.

A typical misconception people may have is to view ChatGPT as a genie to whom they can ask questions without providing context, expecting it to solve all their problems. Once you learn how to guide it effectively, you can begin to extract valuable insights from it.

ChatGPT in Technical terms:

ChatGPT is an AI language model based on the GPT-4 architecture, which is designed to understand context and generate human-like text in response to input prompts. It’s built on layers of foundations developed over decades, but blossomed now.

The basic building block is the Transformer Model & Self Attention. Basically a Deep Neural Network, of certain topology. The model is simply capable of predicting the next word, given a prompt of few words. (more details later further down)
The Transformer Model, then is trained on massive amount of data, and enormous amount of parameters. The size of the data, parameters and the architecture, gives the model a superior quality of predicting next meaningful and accurate sequence of words.
Training is done on massive hardware for long time
Model is fine tuned, with Question/ Answer, pairs. Basically teaching it to hold human like conversations and reasoning

Check in the following image, ChatGPT explaining to me how it works:

Now let’s see it’s summarization and paraphrasing skills:

Why does it work ?

You might find it astonishing and inexplicable that such a straightforward concept, generating a word based on previous words, could lead to this immense transformation and human-like text production. Despite its seemingly simple foundation, it is far more complex than it appears. The technique is grounded in years of research and progress in the realm of Neural Networks and Deep NN, taking advantage of significant advancements in hardware capabilities. It also benefits from the vast amount of data on the internet, produced by humans over the past 20 years.

But why now? What is the key shift that has made the current state so astounding? Let’s dissect the essential components one by one:

Generative Model

The Transformer Model, is capable of generating a meaningful next word. That’s not new, it came out from a research paper on 2017. Attention is all you need. This has changed everything and broke barriers in the previous state of the art, that paved the way to what we’re seeing now. That model is capable of identifying the important parts of the text (Attention & Self-Attention). For example, if we asked the model to continue the sentence:

“car is ….”

Notice how it generated different words, since there is a probabilistic approach in selecting the relevant word.

Now what if the sentence was longer, and contained more clues:

See how it started to generate colors that’s all bluish.

And here, we can see clearly, that it could link my previous statement about the colors I love, to the generated word. Unlike the first attempt when it generated any random word, that could describe a car.

And that’s a major difference between the modern LLMs based on Transformer, and earlier predecessors, like RNN. The ability to learn and apply self-Attention, makes it capable of handling long sequence of words effectively and efficiently.

Interestingly enough, you can also control how much creative the model would be. The spectrum has two ends. Strictly consistent, and Super Innovative.

Check how would it respond, when I ask it to be more consistent:

That was the first piece of the puzzle. Attention.

The second aspect of this conundrum is the parallel training and hardware capabilities. The model’s precision and capacity are heavily influenced by the number of internal parameters it possesses (in other words, the size of the Network). The more expansive the network, the more accurate the outcomes.

However, training an enormous network using earlier methods would have taken multiple years, considering we’re dealing with hundreds of billions of parameters. This is where the capacity for parallel training comes into play.

Up to this point, we had ChatGPT 3, a potent model but not an exceptionally impressive communicator. Following this came the next wave of transformation: ChatGPT 3.5 Turbo.

ChatGPT 3.5 turbo

In this version, the significant transformation in this case was not only the sizable model of ChatGPT 3, but also the fine-tuning process. This refinement allowed the model to captivate the world with its performance. Due to its impressive ability to impersonate how humans communicate, it became the fastest app to reach 100 Million users

The secret word. Is Fine-tuning

Fine-tuning

The model possesses the capability to undergo pre-training and then be fine-tuned with question-answer pairs, enabling it to excel at specific tasks. For ChatGPT 3.5, the focus was on conversation. OpenAI trained ChatGPT 3.5 to become a human-like conversationalist by utilizing an extensive collection of human-curated questions and answers.

Subsequently, the next evolution emerged: ChatGPT 4.

ChatGPT 4

While the exact number of parameters used to train the model has not been officially disclosed, one thing is certain: it is remarkable. Its capacity to emulate human conversation and thought processes is astonishing. People believe that it is not only fine-tuned, but also trained to engage in self-initiated brainstorming to analyze questions and select the most suitable answers.

Where could I start ?

First and foremost, try out ChatGPT by using the app to converse with the model. If you’re in a location without access to ChatGPT, consider utilizing Poe as an alternative. Although chatting with the model won’t allow you to create apps, it’s still an excellent method for comprehending its functionality and limitations. Additionally, take a look at this article for helpful tips on interacting with ChatGPT.

Once you’ve spent some time engaging with the model, it’s time to move on to the next stage: unleashing the full potential of this powerful tool.

Following are some directions and use cases:

Create a Chat Bot, for your website/ mobile App
Use AI, for Natural Language processing tasks. Like Sentiment Analysis, Summarization, Entity Recognition…..etc
Innovate new ideas for new products, using the power of AI. For instance, you can consider that ChatGPT, is a way for humans to talk with computers. For example, it could be used as a Co-Pilot for any product, to allow the user to describe what he needs in plain text, rather than clicking on buttons and searching in menus.
…….

There is no limit for what could be done, but question is how could you do it.

You can use OpenAI APIs itself, or any open source model. Further down we would explain the pros and cons of each approach, but let’s keep this aside for a moment, and go to the next step.

Generally Speaking, there are two ways of building your AI based product idea:

In-Prompt Training: Which means, your product would treat the ChatGPT system APIs, the same way users would use the UI of ChatGPT. Just a well crafted context explaining to ChatGPT how to execute the task required by the user. The following articles, contains very good info on how to do it: Intro Article & More details
Model Fine-tuning: By giving the model examples of question/ answer pairs. The benefit of this approach, is to reduce the size of prompt API call, which means less cost. But the down side of this approach is the need for large set of questions and answers. Check the OpenAI APIs documentation on how to fine tune your model.

For the beginners, I would definitely recommend the In-Prompt training. It’s much simpler and intuitive. Especially with ChatGPT-4, amazing abilities.

To make things even more interesting, there are currently few approaches based on ChatGPT In-Prompt training, that you can use to create really interesting Applications:

ChatGPT Plugins: Check this article on how to use plugins to create very interesting application:
LLamaIndex: A similar approach to the plugins, but using a more open approach. Check this link
Auto-GPT: A really massive transformation in how ChatGPT could take over the steering wheel, and workout the information it missed during the pre-training. Check this article

Once you’re here, it might be worth checking other alternative open source LLMs :

Dolle 2.0: An open source model, allowing commercial use. Check this link
LLama: A model from facebook, but for research purposes only

OpenAI vs. Open Source Models:

When comparing the use of OpenAI models versus open-source Large Language Models (LLMs), several factors come into play:

Performance and accuracy: OpenAI models, such as GPT-3 and GPT-4, are renowned for their advanced capabilities, state-of-the-art performance, and impressive accuracy in generating human-like text. Open-source LLMs, while useful, may not always reach the same level of performance as their commercial counterparts.
Cost: OpenAI models typically require access to their API, which comes at a cost. In contrast, open-source LLMs are often freely available, making them a more budget-friendly option for developers and researchers.
Customizability: Open-source LLMs offer more flexibility in terms of customization, as developers can access the source code and modify the model to better suit their needs. OpenAI models, on the other hand, are generally offered as-is through their API, with limited customization options.
Community and support: Open-source LLMs often benefit from a strong community of developers and researchers who contribute to the ongoing development and improvement of the models. This can lead to more rapid innovation and the availability of diverse resources. OpenAI models, while supported by the company, may not have the same level of community engagement.
Ease of Development: OpenAI models are backed with a simple APIs and a strong organization, that would probably gear the model further into the more advanced direction. It’s definitely easier to start with OpenAI.
Legal and ethical considerations: OpenAI models are subject to specific terms of service and usage policies that may restrict certain applications or impose ethical guidelines. Open-source LLMs may have more relaxed licensing terms, granting developers more freedom in how they use and deploy the models.

Ultimately, the choice between OpenAI models and open-source LLMs depends on your specific requirements, budget, and desired level of customization.

Where to go next ?

If you reached all the way down to here, then probably you’re really interested in digging deeper. So, here you’re some links to interesting talks and articles discussing the subject in more details:

An amazing Youtube video, for building the whole Transformer model from scratch: https://www.youtube.com/watch?v=kCc8FmEb1nY
A great text based explanation for how the Transformer Model works: https://towardsdatascience.com/transformers-141e32e69591
List of Lists! : A massive list of lists: https://medium.com/mlearning-ai/the-chatgpt-list-of-lists-a-collection-of-1500-useful-mind-blowing-and-strange-use-cases-8b14c35eb

Credits

Last but not least, I would like to thank my new friend, ChatGPT 4.0, for helping me brainstorm, review and paraphrase major sections of this article.