Day 45 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
4 min readJul 31, 2020

GPT-3. In my opinion, one of the hot topics right now and I thought I’d share a bit of my own opinion on it.

Before this, for the ones who aren’t aware of what the GTP-3 is, it is claimed to be the most powerful language model to be release uptill now. It is being created by OpenAI which is one of the most well-known in the industry. They had released a research paper recently talking about the effort and work put into the model and honestly, there have been a lot of mixed opinions.

I think one of the main reasons that the GPT-3 is considered to be extremely huge is because it has over 1.75 BILLION PARAMETERS. This basically means that this model has read majority of the internet. I’m actually surprised about the amount of data that it has trained but I’m worried that the it could be only for a few segments. In my opinion, one of the reasons I’m saying this is because not all kinds of data is accurate and not all kinda of data is classified based on its intent or classes so the correctly identified or classified data could be specific to certain features.

But if you think about it, 1.75 Billion Parameters. That's actually mad :p

If any of y’all think about it, larger models train based on a given strategy. : Larger models make increasingly efficient use of in-context information.

Broadly, on NLP tasks GPT-3 achieves promising results in the zero-shot and one-shot settings, and in the the few-shot setting is sometimes competitive with or even occasionally surpasses state-of-the-art (despite state-of-the-art being held by fine-tuned models). Zero-shot learning (ZSL) is a problem setup in machine learning, where at test time, a learner observes samples from classes that were not observed during training, and needs to predict the category they belong to.

I think one of the main approach from the Research paper that I read was about Fine Tuning. Like I mentioned earlier, its one of the commonly used strategies whereby the weights of pre-trained models are just adjusted in order to get the desired output. The main advantage of fine-tuning is strong performance on many benchmarks. he main disadvantages are the need for a new large dataset for every task.

Have a look at the below table for specs on the architecture.

GPT architecture evolution

So if you notice in the table above, the architecture used for the GPT-3 is the same as the GPT-2 and the model was continuously trained and the final evolution with 1.75 billion parameters is what is termed as the GPT-3. Notice the Learning rate and as to how it has been slowly decreased in order to improve fine tuning.

Here nparams is the total number of trainable parameters, nlayers is the total number of layers, dmodel is the number of units in each bottleneck layer (we always have the feedforward layer four times the size of the bottleneck layer, dff = 4 ∗ d), and dhead is the dimension of each attention head. All models use a context window of nctx = 2048 tokens.

Okay now let me talk a bit about the functionality. At its core, GPT-3 is an extremely sophisticated text predictor. A human gives it a chunk of text as input, and the model generates its best guess as to what the next chunk of text should be. It can then repeat this process — taking the original input together with the newly generated chunk, treating that as a new input, and generating a subsequent chunk — until it reaches a length limit. Well, its able to do this coz its basically read the internet. PSYCH.

There is no question that GPT-3 is an impressive technical achievement. It has significantly advanced the state of the art in natural language processing. Yet a realistic view of GPT’s limitations is important in order for us to make the most of the model. GPT-3 is ultimately a correlative tool. It cannot reason; it does not understand the language it generates.

Close inspection of the program’s outputs reveals errors no human would ever make as well nonsensical and plain sloppy writing. GPT-3 MAKES SIMPLE ERRORS NO HUMAN EVER WOULD. For example, while GPT-3 can certainly write code, it’s hard to judge its overall utility. Is it messy code? Is it code that will create more problems for human developers further down the line? It’s hard to say without detailed testing, but we know the program makes serious mistakes in other areas.

I mean, its an amazing achievement in the field at the end of the day (obviously) but we shouldn’t get over our heads about it. Its a start to something big and its definitely gonna help the AI revolution. Y’all can test out the commercial API which would require y’all to apply for it. the link is given below.

Watch the video below just to get a fair insight into what I’m actually saying.

I have also attached the link to the research paper released by OpenAI. Have a look at it.

Thats it for today. Thanks for reading. Keep Learning.

Cheers.

--

--