A Deep Dive into AI Art — Recent AI Breakthroughs

4 min readOct 10, 2022

Although the role of AI in art creation and the relationship between artists and AI is still evolving, some very significant leaps have been made during the past decade, and remarkable feats like computers beating humans at image recognition, at complex games with a mathematically large number of patterns and outcomes like chess and go, AI models generating text and artistic images have already been achieved. It’s extremely intriguing how well AI can mimic and surpass human capabilities in the aforementioned activities.

Deep Learning

Neural Networks

Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure is inspired by the human brain, mimicking the way that biological neurons signal to one another.

Artificial neural networks (ANNs) are comprised of node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

Neural networks rely on training data to learn and improve their accuracy over time, and once these learning algorithms are fine-tuned for accuracy, they can be used to classify and cluster data at a high velocity. Tasks in speech recognition or image recognition, which would take hours by human experts, would take mere minutes by such neural networks.

A neural network that consists of more than three layers, which would be inclusive of the inputs and the output, can be considered a deep learning algorithm.

Deep Learning

Deep learning is a subset of machine learning inspired by the brain’s network of neurons. And while neural networks have been around for decades, one of the most well-known advancements for deep learning happened in 2012, called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It was a pivotal moment for the use of deep neural networks for image recognition.

Another famous model is Google’s BERT, a language model based on transformers to help the search engine get better context around its users’ searches. It was released in 2018 with 12 layers, 12 attention heads and 110 million parameters. Language models based on RNN faced issues like parallelizing and retaining contextual connections. Since context is key in NLP, BERT easily bypassed these problems, and the results were far better than the state-of-the-art RNN models. But this was just the beginning of transformer based neural networks, as Open AI’s GPT-3 was shockingly good and even better than BERT.

GPT-3: AI gets language

On 11 June 2020, OpenAI released the biggest language model ever known, GPT-3 along with its API documentation. Trained on 175 billion parameters, which is 116 times more than its predecessor GPT-2, the language prediction model has the capability to serve a wide range of purposes such as creating blog posts, advertisements, even poetry that mimics the style of famous poets like Shakespeare, Edgar Allan Poe, so realistic that it can easily be looked at as something written by an actual human being. It can even generate text summarizations and code. GPT-3 is trained to perform such tasks with small amounts of input text while maintaining context and producing large amounts of output.

Around September 2020, The Guardian had used GPT-3 to write an article about AI being harmless to human beings. The article really shows its proficiency, and is a spectacle to witness in the world of language generation.

DALL-E: AI gets artsy

DALL-E is a multimodal implementation of GPT-3 with 12 billion parameters, which uses zero-short learning to generate images and art based on text prompts of descriptions and desired styles. Trained on text-image pairs from the internet, DALL-E creates multiple responses to any text input, and CLIP, an image recognition system, understands and ranks these images to associate the most appropriate image to the caption (text).

*An armchair in the shape of an avocado, generated by DALL-E*

Its successor, DALL-E 2 shows even better results in terms of photorealism and caption matching, unlocking a vast potential for generation of art using AI systems.