- Google’s Cloud Vision API is a commercial cloud service that accepts as input any arbitrary photograph and uses deep learning algorithms to catalog a wealth of data about each image, including a list of objects and activities it depicts, recognizable logos, OCR text recognition in almost 80 languages, levels of violence, an estimate of visual sentiment and even the precise location on earth the image appears to depict.
- In total, the Vision API applied 9,853 unique labels to the images, with the most popular being “person” (27% of images), “profession” (14%), “vehicle” (10%), “sports” (7%), “speech” (6%), and “people” (5%).
- The Vision API appears to apply the “person” label primarily in cases where a single person or a small number of people are the primary object of the photograph, such as a speaker standing at a podium.
- The map below colors each country by the density of human faces in all imagery monitored by GDELT from news media in that country — ie, the total number of recognized human faces in all images from that country is divided by the total count of all images from that country.
- It also reinforces why only deep learning systems with large numbers of category labels like Google’s Cloud Vision API are sufficient to work with news imagery — a more simplistic system designed to recognize just a few classes of imagery would struggle to provide much utility when applied to the incredible diversity of the world’s news imagery.
@Forbes: “What does artificial intelligence see in a quarter billion global news photographs?” open tweet »