GPT-4 Revolutionizing Natural Language and Image Recognition

Jorge Alcántara Barroso
4 min readMar 14, 2023

OpenAI has just released its latest milestone: GPT-4

It is a large multimodal model that take in image & text, and replies with text. GPT-4 has been made available for ChatGPT Plus users at a limit of 100 messages per 4 hours, and there is a waitlist for the API that interested parties can add their name to. Bear in mind this information may change at any moment.

How it does compared to GPT-3.5 (ChatGPT)

In short: it’s better at most things, although its knowledge cut-off date is still Sept 2021.

According to OpenAI, GPT-4 exhibits human-level performance on various professional and academic benchmarks. The example quoted on the release had GPT-4 take a simulated bar exam and score around the top 10% (GPT-3.5’s score was ~bottom 10%).

OpenAI tested GPT-4 on a variety of exam simulations that were originally designed for humans. They found that the difference between GPT-4 and GPT-3.5 becomes more apparent when the complexity of the task reaches a sufficient threshold. GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5. You can read more about it on their results table included in OpenAI’s announcement.

Image Credit: OpenAI

How GPT-4 came to be

OpenAI spent 6 months iteratively aligning GPT-4 using lessons from its adversarial testing program as well as ChatGPT, resulting in their best-ever results (though far from perfect) on factuality, “steerability”, and refusing to go outside of guardrails.

OpenAI has been working on scaling its deep learning stack and co-designed a supercomputer with Azure from the ground up for their workload. A year ago, they trained GPT-3.5 as a first “test run” of the system. OpenAI found and fixed some bugs and improved their theoretical foundations, resulting in GPT-4’s training run being (for them at least) unprecedentedly stable. OpenAI aims to hone their methodology to help them predict and prepare for future capabilities increasingly far in advance — something they view as critical for safety.

What is it good for?

Examples of GPT-4’s capabilities are evidence of its advanced language processing and image recognition abilities. For instance, one example showcases GPT-4’s capacity to convert a simple napkin sketch into a functional website through its understanding of HTML, CSS, and JavaScript.

Another noteworthy application of GPT-4 technology is its integration with the Be My Eyes app, which is primarily utilized by blind and visually impaired individuals. In this partnership, GPT-4 assists volunteers in describing images by recognizing and providing relevant information. In the app’s demonstration, GPT-4 accurately identifies a dress pattern, recognizes a plant, offers gym directions, translates labels and recipes, reads maps, and performs various other tasks based on the image presented.

Image Credit: Be My Eyes

It can provide a detailed description of an object or situation, yet its comprehension may be limited to the scope of the given question. However, these examples demonstrate the versatility and sophistication of GPT-4’s image recognition capabilities.

If ChatGPT was a splash, then GPT-4 is set to make a tidal wave in the market. As the latest iteration of OpenAI’s language model, GPT-4 boasts impressive capabilities that could revolutionize the way we interact with technology. From turning napkin sketches into functional websites to describing images for the visually impaired, GPT-4’s potential use cases are endless.

It’s currently available via ChatGPT Plus and the API (waitlist), while the image input capability will be prepared for wider availability in collaboration with a single partner (Be My Eyes). OpenAI is also open-sourcing OpenAI Evals, their framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in their models to help guide further improvements.

As researchers and developers eagerly await access to the API, the possibilities of what can be achieved with GPT-4 continue to grow, we can expect to see even more innovative and exciting use cases emerge. So let’s keep our fingers crossed and hope to get to the top of the API waitlist soon.

Happy prompting!

--

--

Jorge Alcántara Barroso

Spaniard in California. Engineer, tinkerer. AI Builder since 2014. https://integrait.solutions/ AI Consulting & Development. Previously: Directly, Inbenta.