Perception Engines

A visual overview examining the ability of neural networks to create abstract representations from collections of real world objects. An architecture called perception engines is introduced that is able to construct representational physical objects and is powered primarily by computational perception. An initial implementation was used to create several ink prints based on ImageNet categories, and permutations of this technique are the basis of ongoing work.

Ink prints: Forklift, Ruler, and Sewing Machine


Can neural networks create abstract objects from nothing other than collections of labelled example images? Neural networks excel at perception and categorization, so it is plausible that with the right feedback loop perception is all you need to drive a constructive creative process. Human perception is an often under-appreciated component of the creative process, so it is an interesting exercise to to devise a computational creative process that puts perception front and center. In this work, the creative process involves the production of real-world, non-virtual objects.

Given an image, a neural network can assign it to a category such as fan, baseball, or ski mask. This machine learning task is known as classification. But to teach a neural network to classify images, it must first be trained using many example images. The perception abilities of the classifier are grounded in the dataset of example images used to define a particular concept.

In this work, the only source of ground truth for any drawing is this unfiltered collection of training images. For example, here are the first few dozen training images (from over over a thousand) in the electric fan category:

Random samples from the “electric fan” category

Abstract representational prints are then constructed which are able to elicit strong classifier responses in neural networks. From the point of view of trained neural network classifiers, images of these ink on paper prints strongly trigger the abstract concepts within the constraints of a given drawing system. This process developed is called perception engines as it uses the perception ability of trained neural networks to guide its construction process. When successful, the technique is found to generalize broadly across neural network architectures. It is also interesting to consider when these outputs do (or don’t) appear meaningful to humans. Ultimately, the collection of input training images are transformed with no human intervention into an abstract visual representation of the category represented.

Abstract ink print generated from the category “electric fan”

First Systems

The first perception engine implementations were not concerned with physical embodiment. These pixel based systems were inspired by and re-purposed the techniques of adversarial examples. Adversarial examples are a body of research which probes machine learning systems with small perturbations in order to cause a classifier to fail to correctly assign the correct label.

Early perception engine outputs: birdhouse, traffic light, school bus

Adversarial examples are usually constrained to making small changes to existing images. However, perception engines allows arbitrary changes within the constraints of a drawing system. Adversarial techniques also often target specific neural networks. But in this work we hope to create images that generalize across all neural networks and — hopefully — humans as well. So perception engines use ensembles of trained networks with different well known architectures and also includes testing for generalization.


Perception Engines Architecture

As the architecture of these early systems settled, the operation could be cleanly divided into three different submodules:

  • Drawing system — The system of constraints involved in creating the image. The early systems used lines or rectangles on a virtual canvas, but later sytems achived the same result for marks on a page under various production tolerances and lighting conditions.
  • Creative objective — What is the expressive goal? Thus far the
     focus has been on using neural networks pre-trained on ImageNet with an
     objective of maximizing response to a single ImageNet class. This
     is also consistent with most Adversarial Example literature.
  • Planning system — How is the objective maximized? Currently random search is used, which is a type of blackbox optimization (meaning no gradient information is used). Though not particularly efficient, it is otherwise a simple technique and works well in practice over hundreds to thousands of iterations. It also finds a “local maximum”, which in practice means it will converge to a different solution each run.

Growing a fan

The perception engine architecture uses the random search of
the planning module to gradually achieve the objective through iterative refinement. When the objective is to maximize the perception of an electric fan, the system will incrementally draw or refine a proposed design
for a fan. Combining these systems is a bit like creating a computational ouija board: several neural networks simultaneously nudge and push a drawing toward the objective.

Early steps in planning the electric fan print

Though this is effective when optimizing for digital outputs, additional
work is necessary when planning physical objects which are subject to
production tolerances and a range of viewing conditions.

Modeling physical artifacts

After the proof of concept I was ready to target a physical drawing system. The Riso Printer was chosen as a first target; it employs a physical ink process similar to screen printing. This meant all outputs are subject to a number of production constraints such as limited number of ink colors (I used about 6) and unpredictable layer alignment between layers of different colors.

Left: Loading Purple Ink drum into Riso printer
Right: “Electric Fan” print before adding second black layer

At this point in my development I was awarded a grant from Google’s Artist and Machine Intelligence group (AMI). With their support, I was able to print a series of test prints and iteratively improve my software system to model the physical printing process. Each source of uncertainty that could cause a physical object to have variations in appearance is modeled as a distribution of possible outcomes.

Issue #1: Layer Alignment

It is common for Riso prints to have a small amount of mis-alignment between layers because the paper must be inserted separately for each different color. This possibility was handled by applying a small amount of jitter manually between colors.

Example of layer jitter being applied to produce a distribution of possible alignment outcomes.

In practice this jitter keeps the final design from being overly dependent on the relative placement of elements across different layers.

Issue #2: Lighting

The colors of a digital image can be given exactly. But a physical object will be perceived with slightly different colors depending on the ambient lighting conditions. To allow the final print to be effective in a variety of environments, the paper and ink colors were photographed under multiple conditions and then simulated as various possibilities.

Variations from applying different lighting conditions.

The lighting and layer adjustments were independent and could be applied concurrently.

Combining the jitter and lighting variations into a larger distribution of outcomes.

Issue #3: Perspective

In a physical setting, the exact location of the viewer is not known. To keep the print from being dependent on a particular viewing angle, a set of perspective transformations were also applied. These are generally added during a final refinement stage and are done in addition to the alignment and lighting adjustments.

Examples of perspective transform being added to model a range of viewing angles.

Final Print

This system typically runs for many hours on a deep learning workstation in order to generate hundreds to thousands of iterations on a single design. Once the system has produced a candidate, a set of master pages are made. Importantly, the perspective and jitter transforms are disabled to produce these masters in their canonical form. For the fan print, two layers were produced: one for the purple ink and one for black.

Final aligned layers of Electric Fan as they are sent to the printer

These masters are used to print ink versions on paper.

An ink print from the master above (no two are exactly alike)


After printing, a photo is used to test for generalization. This is done by querying neural networks that were not involved in the original pipeline to see if they agree the objective has been met — an analogue of a train / test split across several networks with different architectures. In this case, the electric fan image was produced with the influence of 4 trained networks, but generalizes well to 5 others.

This electric fan design was “trained” with input from inceptionv3, resnet50, vgg16 and vgg19 — and after printing scores well when evaluated on all four of those networks (circled in red). This result also generalizes well to other networks as seen by the strong top-1 scores on four other networks tested (and a lower top-3 score on nasnetmobile).

Constraint System as Creativity

A philosophical note on creativity and intent: Using perception engines inverts the stereotypical creative relationship employed in human computer interaction. Instead of using the computer as a tool, the Drawing System module can be thought of a special tool that the neural network itself drives to make its own creative outputs. As the human artist, my main creative contribution is the design of a programming design system that allows the neural network to express itself effectively and with a distinct style. I’ve designed the constraint system that defines the form, but the neural networks are the ultimate arbiter of the content.

Treachery of ImageNet

In my initial set of perception engine objects I decided to explicitly caption each image with the intended target concept. Riffing off of Magritte’s 
The Treachery of Images (and not being able to pass on a pun), these first prints were called The Treachery of ImageNet.

All 12 prints in the Treachery of ImageNet series

The conceit was that many of these prints would strongly evoke their target concepts in neural networks in the same way people find Magritte’s painting evocative of an actual, non-representational pipe. The name also emphasizes the ImageNet’s role in establishing the somewhat arbitrary ontology of concepts used to train these networks (the canonical ILSVRC subset) which I also tried to highlight by choosing an eclectic set of labels across the series.

Ongoing work

Additional print work is in various stages of production using the same core architecture. Currently these use the same objective and planner but vary the drawing system, such as using multiple ink layers or a more generic screen printing technique. Some more recent experiments also use very different creative objectives and more radical departures from the current types of drawing systems and embodiments. As these are completed I’ll share incremental results on twitter with occasional write-ups here. Additional photos of all completed prints can be found in my online store which is also used to fund future work.