Appendix to “Is that a boy or a girl?”
More details of the model and training data
This is an appendix to “Is that a boy or a girl? Exploring a neural network’s construction of gender”.
For the model, I wanted to start with the simplest possible approach, because this was my first machine learning project. I followed this TensorFlow tutorial, which involves taking an existing deep neural network that has been trained to do image classification, and retraining the final layer of it to classify new categories of images. This sounded like a good place to start, because it would save me from having to train a network from scratch, which would take a really long time on a single laptop.
This approach is certainly far from being state-of-the-art for gender classification, though, and I’d be curious to hear suggestions from experts on possible next steps. From academic papers, it sounds like I could experiment with using different crops of the images, and could also train a special-purpose neural network from scratch rather than retraining an existing one. It’s possible that the results I got, where hair and clothing seemed to predominate over finer details of the face, would have been different if I had designed and trained a model specifically for faces, rather than retraining a network that was originally designed to distinguish broad classes of objects from each other.
For training data, I used the Adience dataset, the full version of which consists of more than 20,000 photos of over 2000 people. The Adience team downloaded these from Flickr users who had made their photos available for reuse via Creative Commons. The team specifically retrieved images that had been auto-uploaded from iPhones, with the aim of creating a collection that was as realistic as possible — i.e. including a range of different viewing conditions, and avoiding professional studio portraits. They then used algorithmic techniques to automatically identify the locations and angles of faces, and rotate and crop the images around each face to standardize the composition. Finally, they manually labelled the faces with a gender and age, as well as an ID number, to keep track of the same individual in different images.
All of that saved me a lot of data preparation time, but I did still have to do some work to filter the collection to make it suitable for my intended purpose. In particular, many people are represented multiple times, so I looked through the collection and tried to choose one good quality, representative, front-facing image per person. This helped to ensure that no individual would have too strong an influence on what counted as “male” or “female”. The chosen photo needed to show the person’s features clearly, without misleading details. For example, I excluded images where the person was wearing sunglasses, and images where part of another person’s face was visible in the crop. In many cases, none of the images of a single person were good enough to keep. I also excluded images of children, because their gender is often very hard even for humans to identify. Of the original 2284 individuals represented in the collection, I ended up keeping 815: 427 females and 388 males. The tutorial suggests having at least 100 images from each category, so this seemed sufficient to start with, but ideally I would have many thousands or even millions of examples.
Citation for Adience dataset and how it was prepared (also includes some discussion of other available datasets):
Eran Eidinger, Roee Enbar, and Tal Hassner, Age and Gender Estimation of Unfiltered Faces, Transactions on Information Forensics and Security (IEEE-TIFS), special issue on Facial Biometrics in the Wild, Volume 9, Issue 12, pages 2170–2179, Dec. 2014 (PDF)