How OpenAI Clip And Github Copilot Can Evolve Digital Jobs?

Daniel Leivas
Geek Culture
Published in
9 min readJul 7, 2021
Photo by Yancy Min on Unsplash

Imagine for a moment a world where all digital jobs are accessible with a natural language, a human language. A world where the computer understands the operations humans need it to do, and it effortlessly.

Imagine that developers by writing 3 words and computers can guess exactly what. Or better, imagine that tomorrow, a developer could dictate his intentions in a few sentences, and the machine could create this application like an e-commerce or mobile web app with just a little bit of context.

Photo editors supervise the imagery that appears in a magazine photo editors supervise the imagery in a magazine. They have years of experience as a photo editor and photo editing applications. Photo editors are highly imaginative people with a keen eye for detail and an impressive collection.

Imagine a future where the job of editing an image boils down to giving some vocal instructions like:

— “I would like to see her with curly hair...”

— “Ok, now, crop the photo automatically, set the aspect ratio to use the horizontal alignment and the perceived distance from the subject.“

— “Then apply 90° rotation and apply 3D perspective. Ok, done”.

This sounds like a technical utopia, but the future is upon us. Our future jobs that look like they’re straight out of a sci-fi movie will emerge as well.

OpenAI has recently released two AI technologies that will complement and expand human skills in that way: Githuh Copilot in collaboration with Microsoft and CLIP (Contrastive Language-Image Pre-training).

Bridging The Gap With Github Copilot

Computer languages are designed to bridge the gap between natural language and binary. For example, in human language, words and phrases usually have multiple meanings. On the other hand, in a computer language, there is never ambiguity and commands with correct grammar are precise. GitHub Copilot is an AI pair programmer that helps you write that bridge faster and with less work with minimal context.

GitHub Copilot is trained on billions of lines of public code. The suggestions it makes to you are adapted to your code, but code was written by others ultimately inform the processing behind it.

First, we have to understand that it is not about a mere copy-paste of all sources of information in Github. We have given you to train is not only that here. It happens the same as with GPT-3, where the system can understand the context of the information.

The generated result is more plausible with what that we have supplied as input. This is why Copilot will take part in the source code to understand the context of what you want to generate. For example, that can be a description in a comment or the name of the function that you have used, or the names of the input variables. With all this information, Copilot can get an idea of exactly what you want and guess what you are asking him to generate.

Github Copilot can save a lot of hours of development. In addition, Copilot can make recommendations in almost any programming language, although it works best with the popular JavaScript, Python and TypeScript languages.

Github Copilot is based on all repositories from Github, and it is powerful. But it is not perfect…

One problem is the full context. It understands the context of your current working file. For example, if you have a long code file, it will understand variables inside this file, and it can be used (or reused) these variables by other functions.

It is like an ultra-powerful autocomplete, but it only has a context of your
file but not the full context of your whole project, and so it will not understand the code from other files. IntelliSense and VS Code or other software development tools are still more powerful at a time. Although, it is impressive.

Another problem is Github Copilot does not follow best practices. It might not work for the version your codebase is on and can lead to conflicts. Copilot might result in security issues, and it can lead to version conflicts.

Github Copilot has created a pair programming method in which your partner continuously presents questionable code.

Even if it never reached perfection, Copilot or its successors could completely the way programmers work; it would only “test, review and verify” the AI code. Will we still call these “programmers”?

Visual And Language With CLIP

OpenAI CLIP (Contrastive Language-Image Pre-training) is a research tool that is trained from images and captions.

Because CLIP’s training set contained conceptual metadata (textual labels), it may respond to concepts presented physically, symbolically, or graphically.

To understand the importance of a model as OpenAI CLIP, we must first focus on understanding the weak points that make many computer vision models.

Many of these models that we use today are powerful and help us to solve many tasks. We also know that it has served as a spearhead for discovering many other techniques that have later ended up impacting other sectors of the field of deep learning this is there.

But that does not mean that these models also have strong deficiencies that we must understand and analyze to improve them. Again, I give you an example: think of a convolutional neural network that we train to learn to classify hotdogs well. But what if we want to classify a pizza?

Silicon Valley — Season 4 Episode 4

Our network among nothing knows how to classify hotdogs, and its output will be prepared to give us one of these two answers, and that is why when we pass a photo of a pizza ling to it, it will do the best it can. Still, in the end, it will end up telling us that as a hotdog, you see the problem we can design and train networks well to learn and solve a certain task but once they are trained, we can hardly vary which task they are solving. This lack of flexibility is well known.

One of the most important trends that have developed in recent years is that of working with pre-trained models on a large data set of images that can be very varied, so we know that the starting network will have learned a lot of patterns that could be used to solve very different tasks.

The first problem was modifying the network architecture minimally to add the last layer in charge of doing this classification.

The second problem is managing the pre-trained models on a large set of data, which are images labelled by human persons who have had to view these images and decide the label associated with these data sets made up of millions and millions of manually tagged images. There is a need to train ever-larger models as these data sets also have to be larger and larger, and even more, profound problems open up.

One solution is OpenAI CLIP. CLIP is a proposal that comes to solve many of the problems that we have raised. It is achieved by combining the power of two models: one of vision (analysing images) and one of language (process the input label). CLIP introduces an alternative form of training to what we have traditionally been using in computer vision. CLIP has to be learning well to extend this concept and understand the association between the descriptions and the content of the images.

This seems like a simple idea, and it is but with some significant implications. Clip no longer needs to be trained with images that are carefully labelled by a human so that its label represents exactly what that image contains.

Now we can provide descriptions with a natural language. This AI system must learn and where it could find a large number of images.

This AI system must learn with many images with descriptions associated with them in different ways with natural language. CLIP has already trained with more than 250 million images and descriptions taken from the Internet.

The power of CLIP not only resides in Internet data (images + descriptions) but also in its way of training because CLIP understands the context. After all, this description is the one that most closely matches the content of one specific image, but also, the model learns why these descriptions do not belong to the original image. This idea that a network not only learns to find what is the relationship between that image and the label but also what are the differences with the rest of the images. What is known as Contrastive Learning

The main idea of contrastive learning is to learn representations such that similar samples stay close to each other, while dissimilar ones are far apart. Contrastive learning can be applied to both supervised and unsupervised data and has been shown to achieve good performance on a variety of vision and language tasks. — Lilian Weng, AI Research Manager at OpenAI

This model is compelling because it actually enables us a lot of exciting tools. But, after all, first what we have is a technology that has learned very well to do the task of joining images with descriptions of their content, a tool that finally reaches the idea of ​​having a real semantic search engine that can understand the content of our images not only for the tags and metadata that we introduce to the search engine but also for a description of the content or from the image.

It is a new training paradigm. No need to modify one of its last layers in a pre-trained model architecture. No need to retrain the network to classify hotdogs and pizzas anymore. It is no longer necessary to be a deep learning engineer and understand the technology we are working with, but simply any user could create their own classifier. The fusion of vision with natural language is suddenly leading us to the semantic web. And this is cool.

You can try CLIP here.

Final thoughts

Those AI are not going to destroy any profession, but AI tools are helpers.

Obviously, Github Copilot can destroy hours of programming work. The goal of these AI is productivity. A tool as a Copilot can reduce many hours of work within the programming labour market. But many questions remain about the evolution of this kind of AI, and we can build a symbiotic relationship with AI.

How to estimate how it is going to evolve this labour market in the coming years? How will things in development change? The evolution of these AI tools are already much more complicated. We will no longer be able to know if the effect is going to be negative or positive.

The pace of AI development seems likely to continue. We can expect to see regular breakthroughs that blow our minds. In this way, we will instruct the computer directly with natural language. This access of the new public to this labour market will be encouraged.

Summary

  • OpenAI has recently released two AI technologies, CLIP and Copilot, which will complement and expand human skills.
  • Even if it never reached perfection, Copilot or its successors could completely the way programmers work.
  • CLIP (Contrastive Language-Image Pre-training) is a research tool that is trained from images and captions. CLIP combines the power of two models: one of vision (analysing images) and one of language (process the input label).
  • A tool as a Copilot can reduce many hours of work within the programming labour market. But many questions remain about the evolution of this kind of AI, and we can build a symbiotic relationship with AI.
  • The pace of AI development seems likely to continue. Therefore, we can expect to see regular breakthroughs that blow our minds. In this way, we will instruct the computer directly with natural language.

Thanks for reading.

Follow me right here in Medium, so you don’t miss the next articles.

--

--

Daniel Leivas
Geek Culture

Curious man in a curious world | Entrepreneur | Lifelong Learner | Lecturer | Coach | Trainer | Adviser | Web lover and consultant