Downfall of individual researchers?

Published in

Code Dementia

4 min readJun 14, 2020

Before starting the main article, let me provide a little bit of context: OpenAI recently talked about releasing their GPT 3 model. To those of you who are not aware of the GPT models, they are freakishly good at text generation — better than a lot of the humans you might know.

While not normally known for his musical talent, Elon Musk is releasing a debut album on Thursday, one that combines the most distinctive sounds from his music with the range of techniques and techniques that have made him one of the most talked-about and successful entrepreneurs of our time. And for all the media attention the 73-year-old has drawn in recent months, the younger generations of his fellow Silicon Valley supporters might say he’s done exactly what they expected — he’s taken the auto-racing world by storm. “There’s definitely a crossover [point between Tesla and music],” says Chris Harman, who’s in charge of major labels and labels at Sony Music Entertainment.

Don’t worry, the above text was not taken out of context — rather, it was something which was generated by the GPT 2 model. One might seriously wonder at the full scope of this model and as a research enthusiast, my mouth just waters looking at the code they’ve provided and their model. And this model had “just” 1.5 Billion parameters, which when you compare it with prior SOTA architectures like BERT (340M) parameters, seems a lot. This might seem mildly over the top right? Why in the hell would you need 1.5B parameters, when research is showing it’s possible to do have significantly fewer parameters to achieve similar results (DistilBERT). To this question, OpenAI gave a very strong reply — by releasing (not releasing exactly, as I’ll explain further) the successor and the third one in their GPT series, GPT 3, with one hundred seventy five billion parameters. Damn. I cannot imagine the computational resources required and the time required to train these many parameters.

The model is super awesome. It literally blows (or at least competes) with the SOTA on many different measures — closed book answering, text gen (obviously), machine translation, even goddamn arithmetic applications; I can go on, but the point is that it’s good enough. But, and that is a big but, the training time is exponential. The current fastest GPU available is Tesla V100. Trivia: Guess the amount of time it would have taken to train the model on the fastest available GPU online. Done guessing? Unless you were guessing randomly to prove me wrong, the number that you have guessed is well below the mark. 355 years — that’s right — 355 years is the amount it would take to train the entire model on a single Tesla V100 (Tesla V100 maximum FP16 performance is a 28 TFLOPS. I’ll let you do the math for the net FLOPs required.) The cost of training this model using Lambda cloud GPU would be? (guess?) 4.6 million dollars. (obviously, it’s training for 355 years. the writer is just way too into dramatic effects and stupid parentheses.)

Wow. Now imagine being a non-funded researcher (by non funded right now, I mean a couple of Tesla V100s. Lol. Not even considering poor people like me) and trying to train your own architecture to compare with the GPT 3, BERT, etc. It would literally be impossible to build a general model to outperform these gorillas, and even if you did focus on some sort of niche task, it would be really easy to replicate your metric scores with these kinds of parameters by finetuning it to your task.

However, there’s some good news. If you are an individual researcher, awesome. Get ready to shell out some money, because the model is not open-sourced. Yes, you read this right (again he goes with the dramatic effects). An organization with the sole purpose of furthering research has decided to provide a commercial API for the model. Basically, you’ll never have the code to experiment around with, and all your tasks will be monitored via the API. Now, OpenAI has a really good case for commercializing this model. The GPT 3 performs scarily well in everyday tasks and can easily be used for online harassment, cyberbullying, convincing spam, etc. If you were confused by the Elon Musk part and thought that it might be written by a human, it already successfully deceived you. And that was the GPT 2 model, one with literally two orders of lower magnitude than its successor. So, commercializing it might make sense to the general users, in fact many might deem it to be necessary.

I know I’ve been salty over a good portion of the article, and there’s a somewhat good reason why. As a guy really into NLP, I’ll never be able to look at the code, nor get the model for exploration which really makes me super disappointed. And that’s the entire point of this article — it’s a rant because I am super frustrated at not being able to explore this model. And you know what bugs me even more? GPT 3 could have written a better article than me with better vocabulary and grammar, but we’ll never know now because the model is commercialized.

References:

https://openai.com/blog/openai-api/
https://arxiv.org/abs/1910.01108
https://lambdalabs.com/blog/demystifying-gpt-3/

Downfall of individual researchers?

Written by Smit Shah