TLDR: I took a subset of 18.5K portraits from a dataset of the Kaggle competition, Painter by Numbers, and arranged them by style and gender. Then I used the Facer library from John W. Miller to build average faces based on these portrait groups, as well as a time-lapse of average faces from the portraits dating from the Middle Ages to the 20th century. Check out my society6 page for prints of these portraits.
I used metadata from the Painter by Numbers dataset, where the subset of portraits was less than 20%. The metadata is quite detailed and convenient, including authors, styles, titles, and years of creation. After filtering, I had about 18.5K paintings declared as portraits. However, my attempts to build average faces by artistic style without additional data cleaning produced strange sexless faces.
Therefore, I had to break these portraits down into categories: group, male, female, child, and other portraits. The “other” portraits were those where I wasn’t able to determine the character’s gender, e.g. some of the Cubist paintings. Note: This creates a bias towards more human-like faces, which is more significant for abstract types of art.
I used only male and female portraits, as there were too few of those of children. From my experience, I knew that Facer needed at least a dozen faces to produce stable results of acceptable quality, so I subdivided the portraits by style, then picked 24 styles that contained enough pictures of both genders to use for averaging. The results are available on a github project (as well as a table with gender labels).
Finally, I averaged portraits for every 50 years from 1500 AD to 2000 AD in 10 year increments, and compiled a time-lapse video. Due to the overlap of 40 years between every two adjacent frames, the video turned out to be quite smooth and nice:
Also, as an experiment, I uploaded some of these pictures to a society6 page, where prints, mugs, and clocks featuring these images are available.