Synthetic Abstractions

My previous post describes the process and methodology behind my recent series of ink prints. This post is an update on the project that also examines more closely the outputs themselves and what they might represent.

Perception Engine Series

After the Treachery of ImageNet series, I completed a subsequent series of ten similar Riso prints called Perception Engines.

Perception Engines: cello, cabbage, hammerhead shark, iron, tick, starfish, binoculars, measuring cup, blow dryer, and jack-o-lantern

The main change in this series is that I wanted all prints to be in exactly the same style. So all ten in this series are from the same version of the codebase and with nearly identical hyper-parameters. This means that the only differences in the resulting prints are the result of the “creative objective” — in this case different ImageNet label targets.

By keeping the style constant, it is more evident how the Perception Engines settle on distinct targets from similar starting conditions. In fact, it is possible to interpolate between drawings such as the hammerhead shark and the iron:

Interpolation between two finished drawings with responses on six tested neural net architectures

This visualization makes it clear that the system is expressing its knowledge of global structure independent of surface features or textures.

Generalization and Abstraction

A goal of this work is to investigate visual representations independent of any particular neural architecture. Research on Adversarial Images has a related term called “transferability” when the goal is that the conclusion of one neural network “transfers” to another specific target. This work extends that idea to “generalization” — the idea that the result should transfer as broadly as possible to unknown architectures and weights.

As an example, let’s consider the tick print. It was created using inputs from six neural network architectures: InceptionV3, MobileNet, NASNet, ResNet50, VGG16, and VGG19. However, it has since been shown to generalize to almost every other neural architecture (usually with ridiculously high scores) including those like DenseNet, which did not exist when the print was made.

A photo of the tick print yields very high scores across neural architectures. The graphs with yellow background are from the six models used to create this print, and the other seven results suggest that this result generalizes well across architectures.

Another way to quantify how well this print generalizes to new neural architectures is to compare the result with the ImageNet validation data. ImageNet validation images are identical to those used in training (taken from the same distribution), so those serve as the best benchmark of how well a network would be expected to respond after training. When using trained weights from InceptionResNetV2 (a model not used to make the print), a photo of this print scores higher than all fifty official ImageNet tick validation images.

Tick print with all fifty tick ImageNet validation images in order of response on InceptionResNetV2

I was floored by this result: This ink print more strongly elicits a “tick response” than any of the real images the network is expected to encounter. To describe this effect I’ve coined the term “Synthetic Abstraction”. My interpretation is that it is possible for neural networks to create a visual abstraction that serves as a more idealized representation than any specific instance. Just as one is able to imagine the Platonic ideal of a perfect circle from seeing only imperfect examples, this process can similarly succeed in creating a visual abstraction that represents the character of the target class more strongly than any particular instance.

Screen Printing

Riso prints have limitations in available ink colors and size of print. However, the same technique of layering ink used in Riso printing is also used in screen printing. I’ve recently adapted my technique to include screen printing, starting with some larger reprints from my Perception Engines series.

Screen-printing process

There is much more overhead in creating screen prints — each layer needs its own dedicated screen which must be burned in, cleaned, and dried. I’ve faithfully recreated larger and crisper versions of four prints, and thus far the results have been worth the effort. Testing has shown these prints to elicit responses in neural networks as strong as the originals smaller prints.

Large format (60x60cm) screen prints of cello, hammerhead shark, tick, and binoculars

These large limited edition prints are currently being shown at Nature Morte gallery in New Delhi as part of their innovative gallery show Gradient Descent, which highlights art created with artificial intelligence techniques.

Transferring outside of ImageNet

ImageNet has been the perfect laboratory to conduct this first set of experiments as the ontology (sets of labelled classes) is fixed and there are many trained models to test how well generalization is working. Similar to physics experiments performed assuming a frictionless table, perfectly aligned ontologies are an idealized case useful as starting point to gain intuitions. But how well might the system work if we relax the constraint that the systems have to share training data and labels?

Google, Amazon, and other companies provide online AI services that serve as a useful reference point. For example, Google Cloud considers the cello screen print to be a cello (perhaps even with cellist). This seems to suggest that results generalize well outside of the training set.

Labels assigned to the cello print by Google’s Cloud AI service

However, Amazon Web Services (Rekognition) registers the same image only as an abstract form — either Art/Modern Art or perhaps a letterform like an ampersand.

Labels assigned to the cello print by Amazon’s Rekognition AI service

This result is not entirely disappointing since both are reasonable interpretations, and Amazon’s failure to recognize this print as a cello suggests that the sets of labels, training data, and distributions across the services might be strikingly different.

There was a similar result with two “Hotdog” prints commissioned by the THOTCON 0x9 conference. “Hotdog” is an ImageNet category, but here I also did testing against the popular Not Hotdog mobile application. (This targeted approach is much more similar to the Adversarial Image technique of transferability.) Results were promising: the HotDog app more often than not reported photos of the print to be a hot dog even with various croppings.

Hotdog print using ImageNet category that transfers well to the Not Hotdog application

With my next series I plan to look at how to get these prints to transfer to these publicly available models trained on other datasets.

Shifting Ontology

Most recently I have decided to shift away from ImageNet and instead focus more on online services such as those offered by Google and Amazon. One important market thus far for online AI services is filtering “inappropriate content.” Google Cloud exposes a version of its “SafeSearch” filter through their Vision API and offers multiple sub-categories such as Adult, Violence, and Racy. Amazon similarly offers an “Image Moderation” service through their Rekognition API, which can label images as Nudity or Suggestive (e.g. swimwear). And Yahoo has open-sourced their NSFW classifier that can report when images are “Not Safe For Work.”

The Perception Engines pipeline was modified to target these three services. I’ve also updated my screen printing-based drawing systems to use several layers of different colored ink. Here are some of the first results:

Two visual abstractions of what online filters consider “inappropriate content”

Photos of each of these prints score strongly as:

  • “Explicit Nudity” according to Amazon Rekognition
  • “Racy” according to Google SafeSearch
  • “Not Safe for Work” according to YAHOO NSFW

This can be verified through the web interfaces for these products.

Amazon’s AI service reports this photo as “Explicit Nudity” with high confidence
Google’s SafeSearch reports this photo as “Racy” at its highest confidence level

Note that for all three of these image filters the ontologies are very much misaligned, the criteria are somewhat vague, and there is no access to the training data for any of these classifiers.

Presumably these models use different training data, so it’s not clear how much overlap these models are expected to have. However, the fact that one image can trigger similar reactions across three models suggests that they may have some shared understanding of this print. I don’t have an intuition of why this particular arrangement of bright shapes elicits this response. If “Interpretable Machine Learning” were more mature, one might be able to get the system to provide a succinct explanation. But in this instance I prefer the mystery of not knowing exactly what subgenre of racy NSFW nudity this print might be eliciting in the models.

Unlike the ImageNet based prints, I’m not yet sure how strongly these results generalize. I plan on investigating further by continuing this series of “Inappropriate Content” images and hope to use this as an opportunity to try out more variations in style.