‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions (ACL 2018 Best Short Paper)

Anthony Chen
4 min readApr 22, 2019

--

Authors: Olivia Winn, Smaranda Muresan

I’m going to say that in the majority of natural language processing problems, we can get away with using language representations trained only on textual data (e.g. Word2Vec and ELMo). But it’s important to keep in mind that for some tasks, true understanding of a word or phrase is only likely to emerge when the representation of the word or phrase is grounded in something beyond just textual data.

Task

If I tell you to think of the color “light blue”, you can probably visualize this. But what if I say “lighter blue”? Unlike “light blue”, it’s hard to see what color “lighter blue” is without a reference color. “Lighter” is a comparative adjective.

In this ACL 2018 short paper, the authors propose to model comparative adjectives by grounding comparative adjective in RGB vector space.

Let’s formalize the task. A color can be represented as a three dimensional vector in what is called RGB space. Each dimension takes a value between 0 and 255. Given a reference color with it’s RGB representation and a comparative adjective, we want to produce a vector in RGB space that indicates the change in the RGB space from the reference color.

Fig 1. The left hand color is the reference color “teal”. The comparative adjective is “darker”. The right hand side is the color of “darker teal” given the reference color. The band represents the change in “teal”.

Dataset

To learned grounded comparative adjective representations, the authors use a labeled RGB color dataset that was collected by asking participants in a user study to “provide free-form labels to various RGB samples”. They apply some transformations to the datasets (see paper for more details) to get their dataset.

Their dataset consists of 415 tuples consisting of 79 unique reference color labels and 81 unique comparatives. A color label is the color in textual form (e.g. blue). Each color label can have many instances in RGB space since there are many different RGB values that could represent “blue”. The total number of data points is around 20 million.

The authors split this dataset of 20 million data points into several partitions.

Data Split

Seen Pairings are color labels and comparatives that have been seen paired in training. Unseen Pairings are color labels and comparatives that have been seen in the training set, but never paired. Unseent Ref. Color and Unseen Comparative are cases where the reference color label and comparative didn’t appear in training respectively. The last line is when both color label and comparative weren’t in the training set.

By partitioning the data in this way, we can see exactly how well this model generalizes to unseen color labels, comparatives, and pairings.

Model

The input to the model is a two-gram of the comparative adjective, two-gram because there are two-word comparatives like “more electric”. The words are represented via 300-D Word2Vec Embeddings. The reference color is represented as it’s 3-D RGB vector. The model is a two-level feed-forward neural network where the output is a vector in RGB space indicating the change from the reference color. The loss is a combination of the cosine distance and length difference from the target color.

Results

The authors evaluate using two metrics: cosine distance and Delta-E. Delta-E is a distance metric for two colors that is based on how the “human eye perceives color differences”.

The paper has a nice image showing the result of different comparative adjectives applied to color labels on the different data partitions.

One thing that stands out is that even though comparative adjectives “more neon” and “paler” weren’t seen during training, the model does a pretty good job of getting the transformation right. It seems that the Word2Vec embeddings capture something about color (!!!) which is why the model can generalize in this way!

Here are the aggreagated results.

The authors also do a cool experiment where they take a reference color and a target color and get the comparative adjective that’s closest to the difference between target and reference color.

Conclusion

It’s a niche problem to be sure, but the authors tackle it in an elegant way that generalizes well. It’s rare to see cases where we can tackle a problem such that the model that solves the problem generalizes well, can handle zero-shot situations, and can generate explanations. Overall, I really like this paper!

--

--