Perception Engines

Tom White
Tom White
Apr 4, 2018 · 8 min read

A visual overview examining the ability of neural networks to create abstract representations from collections of real world objects. An architecture called perception engines is introduced that is able to construct representational physical objects and is powered primarily by computational perception. An initial implementation was used to create several ink prints based on ImageNet categories, and permutations of this technique are the basis of ongoing work.

Ink prints: Forklift, Ruler, and Sewing Machine

Introduction

Given an image, a neural network can assign it to a category such as fan, baseball, or ski mask. This machine learning task is known as classification. But to teach a neural network to classify images, it must first be trained using many example images. The perception abilities of the classifier are grounded in the dataset of example images used to define a particular concept.

In this work, the only source of ground truth for any drawing is this unfiltered collection of training images. For example, here are the first few dozen training images (from over over a thousand) in the electric fan category:

Random samples from the “electric fan” category

Abstract representational prints are then constructed which are able to elicit strong classifier responses in neural networks. From the point of view of trained neural network classifiers, images of these ink on paper prints strongly trigger the abstract concepts within the constraints of a given drawing system. This process developed is called perception engines as it uses the perception ability of trained neural networks to guide its construction process. When successful, the technique is found to generalize broadly across neural network architectures. It is also interesting to consider when these outputs do (or don’t) appear meaningful to humans. Ultimately, the collection of input training images are transformed with no human intervention into an abstract visual representation of the category represented.

Abstract ink print generated from the category “electric fan”

First Systems

Early perception engine outputs: birdhouse, traffic light, school bus

Adversarial examples are usually constrained to making small changes to existing images. However, perception engines allows arbitrary changes within the constraints of a drawing system. Adversarial techniques also often target specific neural networks. But in this work we hope to create images that generalize across all neural networks and — hopefully — humans as well. So perception engines use ensembles of trained networks with different well known architectures and also includes testing for generalization.

Architecture

Perception Engines Architecture

As the architecture of these early systems settled, the operation could be cleanly divided into three different submodules:

  • Drawing system — The system of constraints involved in creating the image. The early systems used lines or rectangles on a virtual canvas, but later sytems achived the same result for marks on a page under various production tolerances and lighting conditions.
  • Creative objective — What is the expressive goal? Thus far the
    focus has been on using neural networks pre-trained on ImageNet with an
    objective of maximizing response to a single ImageNet class. This
    is also consistent with most Adversarial Example literature.
  • Planning system — How is the objective maximized? Currently random search is used, which is a type of blackbox optimization (meaning no gradient information is used). Though not particularly efficient, it is otherwise a simple technique and works well in practice over hundreds to thousands of iterations. It also finds a “local maximum”, which in practice means it will converge to a different solution each run.

Growing a fan

Early steps in planning the electric fan print

Though this is effective when optimizing for digital outputs, additional
work is necessary when planning physical objects which are subject to
production tolerances and a range of viewing conditions.

Modeling physical artifacts

Left: Loading Purple Ink drum into Riso printer
Right: “Electric Fan” print before adding second black layer

At this point in my development I was awarded a grant from Google’s Artist and Machine Intelligence group (AMI). With their support, I was able to print a series of test prints and iteratively improve my software system to model the physical printing process. Each source of uncertainty that could cause a physical object to have variations in appearance is modeled as a distribution of possible outcomes.

Issue #1: Layer Alignment

Example of layer jitter being applied to produce a distribution of possible alignment outcomes.

In practice this jitter keeps the final design from being overly dependent on the relative placement of elements across different layers.

Issue #2: Lighting

Variations from applying different lighting conditions.

The lighting and layer adjustments were independent and could be applied concurrently.

Combining the jitter and lighting variations into a larger distribution of outcomes.

Issue #3: Perspective

Examples of perspective transform being added to model a range of viewing angles.

Final Print

Final aligned layers of Electric Fan as they are sent to the printer

These masters are used to print ink versions on paper.

An ink print from the master above (no two are exactly alike)

Evaluating

This electric fan design was “trained” with input from inceptionv3, resnet50, vgg16 and vgg19 — and after printing scores well when evaluated on all four of those networks (yellow background). This result also generalizes well to other networks as seen by the strong top-1 scores on nine other networks tested.

Constraint System as Creativity

Treachery of ImageNet

All 12 prints in the Treachery of ImageNet series

The conceit was that many of these prints would strongly evoke their target concepts in neural networks in the same way people find Magritte’s painting evocative of an actual, non-representational pipe. The name also emphasizes the ImageNet’s role in establishing the somewhat arbitrary ontology of concepts used to train these networks (the canonical ILSVRC subset) which I also tried to highlight by choosing an eclectic set of labels across the series.

Ongoing work

Artists + Machine Intelligence

This publication showcases collaborations with artists…

Artists + Machine Intelligence

This publication showcases collaborations with artists, researchers, and engineers as part of Google’s Artists + Machine Intelligence program.

Tom White

Written by

Tom White

Artist, researcher, and computational design lecturer at University of Wellington School of Design. Neural networks and creative coding. http://drib.net/

Artists + Machine Intelligence

This publication showcases collaborations with artists, researchers, and engineers as part of Google’s Artists + Machine Intelligence program.