KaoKore: Exploring the Intersection of Humanities and ML Research through a Japanese Art Dataset

People + AI Research @ Google
People + AI Research
8 min readSep 30, 2021

--

Image: Examples of faces in the KaoKore dataset

By Yingtao Tian, Google Brain Tokyo

This post talks about the paper KaoKore: A Pre-modern Japanese Art Facial Expression Dataset, which is a joint work by Yingtao Tian from Google Research with collaborators from Japan’s National Institute of Informatics (Chikahiko Suzuki, Tarin Clanuwat, Asanobu Kitamoto), University of Montreal (Alex Lamb), and University of Cambridge (Mikel Bober-Irizar).

A paper describing this work has been published in the proceedings of ICCC’20, and open-sourced for access online.

Humanities research and machine learning have the potential to enrich each other. Machine learning can provide a springboard for new types of research inquiry in the humanities — and even lead to the generation of entirely new forms of art and creative work.

However, the datasets that power the machine learning models and the models themselves can impact the fruitfulness of the research. To see how machine learning can — or cannot — be used for humanities research, we looked at how we could apply it to historical Japanese art to decipher cursive and illustrations.

Before diving into the challenges of using machine learning in these scenarios, it helps to understand Japanese art. In pre-modern Japanese storytelling, a single story is portrayed in one long and continuous painting, usually in the form of a picture scroll or picture book. Below is an example from Tale of the Hollow Tree (宇津保物語, Utsuho Monogatari), a tenth-century Japanese story in the form of a picture book. On the left-hand side of the image, cursive script tells the story, and the adjacent painting illustrates that story and its characters.

Example from Tale of the Hollow Tree. Source: Dataset of Pre-Modern Japanese Text, National Institute of Japanese Literature

An opening for machine learning

Transcribing Japanese cursive writing found in historical literary works like this one is usually an arduous task even for experienced researchers. So we tested a machine learning model called KuroNet to transcribe these historical scripts. But translating the script alone doesn’t tell the whole story — we also need to consider the illustrations. For Japanese humanities research, these illustrations and the faces of characters offer information about the artwork’s content and how it was created. The way painters render or express faces through things like texture, shape, color and jewelry help identify artwork and how its trends changed over time.

Computing transcriptions of historical Japanese cursive writing has become feasible with KuroNet’s learning model. Is it possible to assess illustrations and faces similarly?

There was an existing and extensive dataset, called the Facial Expressions Collections, that is a publicly accessible project from the Center for Open Data in the Humanities. It provides a collection of faces extracted from artwork from Japan’s Late Muromachi Period (16th century) to the Early Edo Period (17th century), and offers the potential for analyzing these faces and bridging Japanese art history and machine learning.

Three examples of painting styles and themes in the collection. From left to right: nobles in 源氏物語 (Genji Monogatari); high-level officials in Chinese dress seen in 鳳闕見聞図説 (Houketsu Kenmon Zusetsu); and legendary spirits depicted in 酒呑童子 (Shuten-dōji).

Creating a new data set

While the Facial Expression Collection’s dataset is extensive, it was designed mostly for humanities researchers without a computational background rather than for machine learning and other computational usages. Things like the data format and image sizes made it difficult to adapt it for machine learning models.

Building off of the data in the Facial Expressions Collection, we decided to create a publicly available machine learning-friendly dataset called KaoKore. The name is shorthand for Kaokatachi Korekushon (コレクション) , which is Japanese for Facial Expression Collection.

Examples of faces in the KaoKore dataset showing a diverse yet coherent range of subjects and styles.

KaoKore contains 8,573 color images of cut-out faces from the Facial Expressions Collections. The format and processing make the resulting dataset easy to use with off-the-shelf machine learning models and tools. Additionally, we provided training, developing, and testing splitting to allow comparison of various machine learning models and make KaoKore useful for supervised machine learning.

The dataset also includes labels from experts in Japanese art who examined the full context of a story and labeled the faces based on gender and social states. The historical depictions of gender in the work follow male and female binaries. (Note that the binary classification is due to historical limitations. For more on this topic, see Resources in Japanese Women’s History.) For social status, labels include “noble,” “warrior,” “incarnation”(in the context of Japanese culture, incarnation means the legendary appearance of gods and spirits in the form of humans in this world), and “commoner”.

Using the data to understand the content depicted in the art

Using this data, we could analyze the labeled categories across the dataset to get a sense of what was depicted in the artwork during these times and arrive at new insights about the artwork. The ratio of male-presenting to female-presenting faces showed 77% male-presenting and 23% female-presenting characters. Whereas, the ratio for social status broke down to 47% noble, 34% warrior, 9% incarnation, and 9% commoner. In this set of artwork, most warriors (97%) and incarnations (91%) are deemed male-presenting and are key figures in traditional Japanese stories. If we exclude these categories in the gender analysis, the ratio becomes 63% male-presenting and 37% female-presenting — while it’s still skewed toward male-presenting characters, it’s much more balanced than what we may expect for this period.

The labels that are available in the dataset along with examples for each label.

Two types of experiments help demonstrate the dataset’s value. The first type is the quantitative classification defined as: the model must classify whether a given image is a male or female character and what their social status is.

The table below shows the results of several common machine learning models using the training, developing, and testing splitting provided in our dataset. These models achieve reasonable, but imperfect results (classification accuracy for gender is < 95% and for social status is < 85%). As expected, the newer and larger models often achieve better results, but the results are not precise enough. Our conclusion is that there’s room for improvement for better ML models on this dataset: for example, in the future, new models and training mechanisms (like few shot learning or transfer learning) could perform better and have higher accuracy.

Table: Quantitative test results

Exploring the mechanics of painting styles and machine learning

Since KaoKore itself is based on artworks, we’re investigating creative applications to evaluate if researchers in the humanities can find novel and artistic ways to engage with the dataset. An effective way to do that is to use an effective generative model.

We first explored Generative Adversarial Networks (GANs), which have been successful in synthesizing high-quality images. The “ancient” faces below are not so ancient after all: they’re all newly generated by the GAN model. The model generated a diverse yet coherent style that reflects different approaches to art.

An unedited range of images produced by StyleGAN trained on the KaoKore dataset. The samples demonstrate that the variety in our dataset is well captured.
Left: Generated faces; Right: real examples from KaoKore datasets.

GAN models can generate plausible, convincing images. But more fundamentally, GANs directly generate pixels, which is different from how a human artist paints an illustration. Typically an artist paints the image by layering strokes iteratively on a surface. Therefore, when a GAN makes a mistake, the type of error it makes is quite different from the type of “error” a human painter might produce.

To give the synthesis process a more artwork-like inductive bias, we considered stroke-based rendering. This stroke-based rendering model decomposes images into strokes that resemble how a human artist might create an image with a pencil sketch.

Painting sequences generated by intrinsic style transfer, a neural painting model, based on a few example images in the KaoKore dataset.

We also explored another work, learning to paint (Huang et al). This work is a unique neural painting model trying to approximate the image in a “minimalist way.” The model uses simple brushes and as few as possible. In the image below, the model generates a sequence of vivid paintings, but the style is very different from the previous experiment with a pencil drawing style. This model shows that the produced curve regions learn how to approximate the painting in an abstract way, emphasizing the object’s general arrangement rather than its details.

Painting sequences produced by learning to paint, a neural painting model, based on a few example images in the KaoKore dataset.

The two painting models offer examples of fundamentally different mechanisms that lead to the generated styles. Generated examples shown above are visually plausible but semantically could be more complicated. They could be agreeing or disagreeing with experts’ perspectives of what a particular genre of art is. Such matters of nuance provide a topic of discussion for humanities researchers regarding modern computational creativity approaches. We hope this can offer insights into the mechanics of painterly style as well as machine learning processes that benefit both humanities and ML researchers.

The opportunity ahead for Japanese art and beyond

As we’ve seen first-hand on this project, machine learning datasets can contribute to cultural preservation, expand the discourse of humanities research, and unlock new insights and approaches.

We plan to build a machine learning-powered human-in-the-loop annotation mechanism to increase the number of face images in our dataset. With new datasets, we can expand machine learning research and its applications beyond the face images in the KaoKore dataset. We hope this will create more avenues for humanities research and machine learning research to work together — in the domain of Japanese art and beyond.

Sources

[1] Karras et al.: A Style-Based Generator Architecture for Generative Adversarial Networks

[2] Nakano: Neural Painters: A learned differentiable constraint for generating brushstroke paintings

[3] Huang et al: Learning to Paint With Model-based Deep Reinforcement Learning

[4] Lamb et al.: KuroNet: Regularized Residual U-Nets for End-to-End Kuzushiji Character Recognition

--

--

People + AI Research @ Google
People + AI Research

People + AI Research (PAIR) is a multidisciplinary team at Google that explores the human side of AI.