Endless connections: Using AI with digital collections

SHI Weili
Bluecadet
Published in
8 min readMar 10, 2021
User interface of the THF Collection AI Table, using AI to create a “gradient” of visually similar objects
User interface of the THF Collection AI Table, using AI to create a “gradient” of visually similar objects

When we created the Connections AI Table with The Henry Ford Museum of American Innovation (THF for short), our goal was to demonstrate the interconnectedness of the museum’s collection. THF curators did the heavy work of identifying meaningful connections between many of the objects; we wanted to extend that experience to as much of the collection as possible. To fully provide endless connections to our users, we leveraged the computational power of artificial intelligence (AI). Through our research and engineering we found that while the current-stage AI is no substitute for human curators in systematic thinking, it can be adopted as an effective tool to suggest visual connections between seemingly unrelated objects. The result was AI-curated connections that offer never-ending interactive exploration that is both insightful and full of surprises.

In this post, we share our early work that formed the foundation of the AI aspects of the project. We started prototyping with a full export of THF’s digital collection—containing tens of thousands of object images and related information—and came up with two approaches to create visually meaningful paths between images in the dataset: visual similarity paths and dominant color paths.

Visual Similarity Paths

To start, we used a pre-trained deep learning model (VGG16) to analyze every image to get the model’s internal representation of this image (technically called latent representation, in the form of an array of numbers). With this information we could find the location of every object in the “THF universe” (technically, latent space), as indicated in the following graph:

Tens of thousands of THF collection objects plotted based on visual similarity. The original visual information was 4096-dimensional and was reduced to 2D for plotting using the t-SNE algorithm.
Tens of thousands of THF collection objects plotted based on visual similarity. The original visual information was 4096-dimensional and was reduced to 2D for plotting using the t-SNE algorithm.

For any two objects in this universe, we can create a visual similarity path by simply selecting a series of objects between them. But more specifically, how do we select these objects?

Naïve Approach: Linear Interpolation

Our first approach was simple. In the latent space, draw a straight line between the two objects and evenly divide the line into a few steps. For each step select an image that is closest to the dividing point from the latent space. This is almost like taking the 2D visualization above, tracing a line through objects, and presenting those back as a path of visually similar objects—but we’ll do this in a higher-dimensional space.

Diagram explaining the linear interpolation approach

The original 4096-dimensional latent space is too sparse for the number of images we have: If we simply draw a line between two images in that space, most parts of the line will be in empty space, so the closest images to each step could be drawn from pretty far away. Even if we reduce the latent space to 10 dimensions using the PCA (primary component analysis) algorithm, this situation still happens and is reflected by duplicated images in the following example path.

A visual similarity path with duplicate steps in the beginning and end
A visual similarity path with duplicate steps in the beginning and end

After reducing the dimension of the latent space to 3, we mostly get non-repeating paths, as the following example shows.

A path without duplication, but the visual connection between many of the steps is unclear
A path without duplication, but the visual connection between many of the steps is unclear

We also have control of the number of steps we sample along the path. The more steps a path has the smoother the transition is supposed to be. However, the paths formed by linear interpolation only make sense to some degree. After all, when drastically reducing the dimension, we lost a lot of information from the latent representation. Our next approach, shortest path, is able to find more convincing visual connections because it can take advantage of the information in the original high-dimensional data.

Sophisticated Approach: Shortest Path

First, build an interconnected network (technically a graph) of object images using the latent representation data. For each image, find the closest k images in the latent space, and set up a direct connection between them. For all other image pairs, the only possible way to connect them is through other images. Then, for two randomly selected images, run the shortest path algorithm to find a path between them.

Diagram explaining the shortest path approach

With this approach we also have a few parameters under control. Similar to linear interpolation, we can reduce the dimension of the latent space. But with shortest path, the higher dimension we keep the more information we make use of, and therefore the better the result. The benefit of dimension reduction is time saving when building the graph. With digital collections there is hardly ever a case where we would need to update this “live” for the user, so we were able to run our algorithm overnight as a daily routine. It has all the time it needs.

Different from linear interpolation, we can’t explicitly choose how many steps we want the path to be. Instead, we choose how many direct connections to assign to each image when building the graph. The more direct connections the denser the graph is and the shorter an average path will be, and vice versa. However, we don’t want to assign too few direct connections to each image: that might result in image pairs for which a path cannot be found, since the graph would not be fully connected.

Example Paths

In the original 4096-dimensional latent space:

Example path
Example path
Example path
Example path
Example path

We can also slightly reduce the dimensionality of the data for some moderate speed gain. In the reduced 300-dimensional latent space, the paths are nearly as good with occasionally more confusing connections. We ended up not doing this in production, as the speed gain wasn’t worth the weirdness of some of the connections.

Example path
Example path
Example path
Example path
Example path

Note that although we can’t explicitly ask for a fixed-step path, we can always extend a path by forming a new one starting from the endpoint of the existing one. The following examples (with each new path starting from the end of the last path) are again from the 4096-dimensional latent space:

Multiple paths with the end of each path identical to the beginning of the next path, forming an endless path
Multiple paths with the end of each path identical to the beginning of the next path, forming an endless path

Or we can form multiple paths from the same beginning object:

Multiple paths starting from the same object
Multiple paths starting from the same object

We can play with longer—and sometimes weirder—connections by selecting a random “middle” point and building a compound path from A to Z to B (Z = the random middle):

Multiple paths starting from the same object, deviating towards different middle points, and converging to the same end. The middle-point object of each path appears twice, because it functions as both the end of the first half and the beginning of the second half of each path.
Multiple paths with the same beginning and same end. The middle-point object of each path appears twice because it functions as both the end of the first half and the beginning of the second half of each path.

Dominant Color Paths

Besides connecting objects with visual similarity, what’s also conceptually interesting (and visually appealing) is connecting them with objects of gradually changing colors, forming a gradient.

To start, we used the k-Means algorithm to get the dominant color of every image. A color is represented by a 3-dimensional vector. Therefore, we can run the same linear interpolation and shortest path approach described above with the color data. The catch is that not all images are vivid enough to pick up their main colors—a few pale images in the path and the color transition won’t make much sense to the eye. After some experimentation, we figured out the following steps to ensure beautiful color paths:

First, we cropped each image to its central 50% part to reduce the influence of background colors—especially since THF’s collection objects are not uniformly documented against the same background. Then, we ran the k-Means algorithm with k=3 (to cluster all pixel colors into 3 clusters and select the centroid color of each cluster). We sorted the selected colors of an object in descending order according to chroma (chroma is computed by first converting the RGB color into LAB color space and calculating sqrt(A² + B²)). Essentially chroma gives us a mathematical representation of the image’s vibrancy or color intensity. We then selected the first color that has a chroma no smaller than 2000 (colorful enough) and has a cluster size bigger than 25% of the whole cropped image as the dominant color of the image. If none of the colors meet the criteria, we discarded the image because it will be confusing to the eye in a color path.

An example image and its k-Means color list, with the first color selected as the dominant color
An example image and its k-Means color list, with the first color selected as the dominant color

We use the same linear interpolation and shortest path approach to find paths in the color data. Because the color space is 3-dimensional (low-dimensional) by default, the images seem to be distributed rather evenly. Therefore, both the “naive” linear interpolation approach and the “sophisticated” shortest path approach work pretty well. Actually, linear interpolation might work better because the color path feels more “linear” whereas the shortest path can “zig-zag” in the color space.

Example Paths

Below are paths among object images, formed by linear interpolation. Each image has its dominant color below it. Note that when randomly selecting the beginning and ending images we made sure they are distant enough in the color space to allow interesting transitions between them.

Example path
Example path
Example path

Paths among object images, formed by shortest path algorithm:

Example path
Example path
Example path

Where to from here?

The learnings from this phase went directly into the design and development of the Connections AI Table, with integrated AI-generated connections becoming a key component of the experience. From here, we dove into production:

  • Designed a playful, explorative, and meaningful user experience around the technology
  • Optimized the performance for the interactive experience
  • Integrated the Python/Tensorflow-based AI backend with the C++/Cinder-based interactive frontend
  • Streamlined the data processing routine to allow continuous updating of the museum collection data

What is fascinating in looking back on this work is that in a way we had to become curators ourselves. As you can see above, we had to hone the algorithm and its parameters to find connections that were meaningful for our users. All of the examples are technically “correct” to the AI—but it took actual human input and critical thinking to build something that we felt made sense.

To learn more about the full creative process for this project, read Playful Prototyping with Machine Learning.

--

--