Implications of Meta’s challenge to GPT-3 and Open AI.

This has huge implications for the future of Machine Learning

Devansh
Dialogue & Discourse
9 min readMay 12, 2022

--

Join 31K+ AI People keeping in touch with the most important ideas in Machine Learning through my free newsletter over here

Meta AI recently released Open Pretrained Transformer (OPT-175B), “a language model with 175 billion parameters trained on publicly available data sets”. While this might seem like another big company joining the LLM wars, the way they did it was a shock in the Machine Learning Community. In their post, Democratizing access to large-scale language models with OPT-175B, Meta had the following to say

For the first time for a language technology system of this size, the release includes both the pretrained models and the code needed to train and use them.

This is quite exciting for a lot of reasons. One most people have no hope of understanding what the details of working with problems of this scale entail. Therefore from a purely educational perspective, this will be exceptional learning for anybody that goes through this (more details later in the article). However, this also had profound implications on the Deep Learning industry, one that many people haven’t thought about.

Seems like all big tech companies are trying to eat into each other’s domains to establish dominance

The assumed monetization strategy for the complex LLMs was simple. They could seriously boost productivity, so they could have been sold as APIs either directly or embedded in another service. This article will cover how Meta releasing their model seriously changes the landscape of the industry. To fully understand the implications, let’s first understand the context around this. )

Background- LLMs and Machine Learning

The introduction of LLMs (Large Language Models) has been a game-changer. Large language models — natural language processing (NLP) systems with more than 100 billion parameters — have produced insane results in NLP and AI research over the last few years. I have covered some of them in my content. The video Machine Learning News you must know- April 2022 and my article, Google AI sparks a revolution in Machine Learning are the most recent examples.

This is Google’s amazing PaLM model, showing an understanding of language never thought possible

Most popular among these is the legendary GPT-3 Model by OpenAI. The trend-setter, GPT really showed us the potential of using Transformers and large datasets to achieve performance at a variety of complex tasks. It caused shockwaves in the mainstream narratives when OpenAI debuted Github Copilot, an AI code completion service trained on a descendant on GPT-3.

To learn more about how this was part of Microsoft’s larger strategy, read this article

Since then, GPT-3 has gone on to add tons of new abilities including editing texts (including code) in particular styles and error-correcting. Around Late March/Early April they made waves in when they released DALLE-2, a Deep Learner that could generate images from text descriptions. Here is an example

DALL-E 2 is an improvement over the already impressive DALL·E: Creating Images from Text

A lot of the discussion in the last month has centered around DALL-E’s growing capabilities in a variety of tasks compared to the extremely deep understanding shown by Google AI’s Google’s Pathways Language Model (PaLM). This is another demonstration of PaLM’s insane abilities in NLP.

Seriously, read Google AI sparks a revolution in Machine Learning if you haven’t. It covers the platform behind PaLM

However, Meta AI’s decision to completely open up its models has hijacked the discourse recently. As promised, I will go into some of the interesting aspects of the resources they shared and why this is a game-changer.

Important Talking Points

This decision affects several stakeholders in different ways. Here are a few-

  1. Researchers/Other people looking to learn from this.
  2. Meta itself
  3. OpenAI
  4. The ML/Software Development industry.

Educational/Research Impact

This is a huge win for researchers or anyone looking to learn about Machine Learning. Most notably, this is the antidote to the replication crisis in Machine Learning. I have covered AI’s replication crisis in this article. However, to give you a nutshell, much of Machine Learning is impossible/impractical to reproduce and verify. When it comes to the big companies- like Facebook, Google, and Microsoft- much of this occurs because they are able to train models at a scale that no one else can replicate.

Excerpt from the aforementioned article

This becomes a problem since it makes it impossible for outside people to break down their findings and find flaws in their methodology. It also severely limits the amount of meaningful discussion we can surrounding a paper/finding when you can’t dig into the nuances of the setup for it.

Source: Even for the top levels of Machine Learning, regexing is a mainstay. Subscribe to my newsletter to master them

However, that is not all that makes this a big win for Machine Learning Education. When Meta released their code, they also released a lot of other resources. These resources detail the various facets of their large-scale system. My personal recommendation is to read through their Chronicles of OPT-175B training. They detail a lot of the challenges they went through as they were training at this insane scale. Take a look at the following section

It’s been really rough for the team since the November 17th update. Since then, we’ve had 40+ restarts in the 175B experiment for a variety of hardware, infrastructure, or experimental stability issues.

The vast majority of restarts have been due to hardware failures and the lack of ability to provision a sufficient number of “buffer” nodes to replace a bad node with once it goes down with ECC errors. Replacement through the cloud interface can take hours for a single machine, and we started finding that more often than not we would end up getting the same bad machine again. Nodes would also come up with NCCL/IB issues, or the same ECC errors, forcing us to start instrumenting a slew of automated testing and infrastructure tooling ourselves

Taken from their log, Update on 175B Training Run: 27% through

This was an amazing decision taken by the Meta AI team. Reading through these has been interesting, and for anybody who wants to get into Large Scale Deep Learning, understanding their challenges is a must. From a research/education perspective, this publication is a huge win.

Impact on Meta

The impact of this on Meta is going to be harder to evaluate. Releasing this model in the way allowed them to really gain a lot of positive publicity. And the model being released for free also means that people are now much less likely to use paid models from their competitors. This is an edge by itself.

We have already seen people testing OPT and adding enhancements, bugs, etc. Source

This process also has two other notable advantages. Firstly, since the model is open, it is possible for people to find and discover areas for improvement. This facet of the open-source culture is what is responsible for the explosive growth of tech over the last 2 decades. This gives them access to potentially millions of hours of free debugging/testing done by the community. And they get a lot of insight about what facets the ML community finds the most important/engages with the most.

The second advantage is the familiarity with the Meta tech and tools. This is something that a lot of people overlook. Let’s take the example of Tensorflow, by Google. Most serious ML practitioners are proficient with it. This makes it easy for Google to hire ML engineers since most developers will be familiar with the tech. The amount of resources they need to spend training new engineers thus goes down drastically.

To gain a deep understanding of the foundations required for Machine Learning or Software Engineering, check out my newsletter. You can get a 30-Day free trial, by using this link.

All of these are huge positives. However, this is offset by a huge problem. Training such a model was extremely costly. To give the whole thing away for free will have a lot of financial implications. While it puts a damper on Open AI and their monetization of GPT-3, it is also going to make it harder for Meta to monetize such a model in the future. However, Meta was wildly profitable last year, so perhaps the pros outweigh this.

Impact on Open AI

This is a huge L for Open AI. We have already covered how this will take away a large chunk of the potential customers. It seems like Meta AI has recently decided to pick a fight with Open AI products. Between Make-A-Scene, their work modernizing CNNs to match Vision Transformers, and OPT, we see a lot of recent releases being competitors to Open AI products.

We developed OPT-175B with energy efficiency in mind by successfully training a model of this size using only 1/7th the carbon footprint as that of GPT-3

The industry

The Machine Learning industry is definitely licking its lips at this development. For the reasons already mentioned, this is a huge win for AI researchers and developers. This is indirectly a win for the industry.

There are two ways that this situation can play out-

  1. Other tech companies join this trend and they start undercutting each other to gain an edge in the market. Economics tells us that this is amazing for consumers (us).
  2. Business as usual in the industry. The other big companies don’t take the bait. For all the reasons mentioned earlier, this is already a huge win for consumers.

What is often lost when we cover important Machine Learning research is that most of the industry consists of small to medium-sized companies/groups solving very specific problems. While this puts pressure on the big tech companies, this is overwhelmingly a win for the smaller companies, since they get to learn from and use the insights generated from these massive companies without having to churn through the resources themselves. Therefore, this is a huge win for the industry as a whole.

That’s it for this article. As you can see this move by Facebook will impact the stakeholders in the ML in different ways. If you have anything to add, let me know in the comments. I would love to learn about how you think this will impact the industry. In order to set yourself up to take advantage of this massive development, mastery over Machine Learning is crucial. This article gives you a step-by-step plan to develop proficiency in Machine Learning using FREE resources. Unlike the other boot camps/courses, this plan will help you develop your foundational skills and set yourself up for long-term success in the field.

Don’t take shortcuts in ML. You’re just setting yourself up for failure.

For Machine Learning a base in Software Engineering, Math, and Computer Science is crucial. It will help you conceptualize, build, and optimize your ML. My daily newsletter, Coding Interviews Made Simple covers topics in Algorithm Design, Math, Recent Events in Tech, Software Engineering, and much more to make you a better developer. I am currently running a 20% discount for a WHOLE YEAR, so make sure to check it out.

I created Coding Interviews Made Simple using new techniques discovered through tutoring multiple people into top tech firms. The newsletter is designed to help you succeed, saving you from hours wasted on the Leetcode grind. You can read the FAQs and find out more here

Feel free to reach out if you have any interesting jobs/projects/ideas for me as well. Always happy to hear you out.

For monetary support of my work following are my Venmo and Paypal. Any amount is appreciated and helps a lot. Donations unlock exclusive content such as paper analysis, special code, consultations, and specific coaching:

Venmo: https://account.venmo.com/u/FNU-Devansh

Paypal: paypal.me/ISeeThings

Reach out to me

Use the links below to check out my other content, learn more about tutoring, or just to say hi. Also, check out the free Robinhood referral link. We both get a free stock (you don’t have to put any money), and there is no risk to you. So not using it is just losing free money.

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

If you’re preparing for coding/technical interviews: https://codinginterviewsmadesimple.substack.com/

Get a free stock on Robinhood: https://join.robinhood.com/fnud75

--

--

Devansh
Devansh

Written by Devansh

Writing about AI, Math, the Tech Industry and whatever else interests me. Join my cult to gain inner peace and to support my crippling chocolate milk addiction

No responses yet