Case Study: Fragments and Jaguar Vision

Petr Sigut
Fragments
Published in
5 min readJul 2, 2018

This is the first in a series of articles which will bring focus to our partners and the ways in which they envision using Fragments platform. These case studies are intended to introduce the companies that we are partnered with, describe the real life applications they will use Fragments for, and bring inspiration to others as to the many diverse ways in which the platform might be used.

To kick things off, we’re very pleased to present Whale & Jaguar, a machine learning startup based in Colombia. For them, the Fragments platform will be applied in order to train and improve various products, such as their ‘Jaguar Vision’, using a global team of annotators.

The following is a guest post by Whale & Jaguar:

Jaguar Vision: An algorithm for understanding people’s way of seeing.

We’re a multidisciplinary team merging the investigation of science and technology with digital communication strategies. We’re physicists, political scientists and historians interested in data, images and the human behavior which derives meaning from information. That’s why we work on the development of technology capable of seeing, reading and analysing the language we use to communicate on digital networks.

Male jaguar killed in the Sierra Nevada de Santa Marta

Visual content exists alongside its textual descriptor. As a way of communication, it’s deconstructable and measurable. We want to understand people’s ways of seeing and to do so we train artificial intelligence algorithms that study and classify images at massive scale. We are dedicated to extracting knowledge from data, but how?

To turn the machine into a connoisseur of the shape of things which can be seen, we show it the steps a human takes. We feed images from a visual database, each one labeled with tags associated with its content (“mountain”, “house”, “river flowing”) and select the most relevant tags for each case.

Then, we ask for its composition. How are objects distributed across the space defined by the image? Centered, placed on a corner, randomly set, on a quadrant. An algorithm capable of recognizing these characteristics in a visual object could join digital network analytics, capable of measuring the sentiment and opinion of the audience, the effectiveness of the visible, and the patterns of visual communication proper of an entity, to give us a better understanding of the way we relate to the Internet’s visual environment.

As part of the process, we will train a convolutional neural network (CNN), one of the most efficient machine learning algorithms for image classification. A CNN works by mimicking the response of neurons to visual stimuli, and by doing so, recognizes visual patterns shared by a group of images and learns the shapes of their elements. This is accomplished by reconnaissance units called filters: hundreds, thousands or even millions of them. Every filter should be trained to recognize certain particularities within a region of the image: corners, legs, feet, noses, faces, water, etc. When identified together they give rise to the accurate identification of the image content: dog, cat, human, mountain, house, river flowing.

Training our neural network models requires a large set of images tagged according to their content — this is where our friends at Fragments with their platform for micro-task apps play a crucial role. A micro-task app on the Fragments platform will enable us to engage a scalable human workforce in both collection and annotation of our large data sets. Using the Fragments platform, we will be able to split data set collection and labeling into a large amount of micro-tasks and distribute them among a huge quantity of workers, in order to quickly obtain structured and labeled data. Workers' answers to questions such as “Is this a cherry?” or “How are the elements distributed across the image?” will give us a better understanding of people’s way of seeing.

The first application of Jaguar Vision will occur in digital networks. We’ll study content produced by multiple entities, corresponding to different segments of population, and find visual patterns to which the audience reacts with positive, neutral and/or negative sentiment. Thus, we can determine whether there are relationships between the visual training of an audience and its response to certain types of published image.

We want Jaguar Vision to allow us to study both shared and exclusive features in specific sets of images, such as the Gráficas Molinari image archive, that require special treatment and an investigative point of view in order to identify, for example, how women are portrayed and defined through visual characteristics from the mid twentieth century. We intend to compare the contents of such image sets with textual sources digitized by the largest libraries in our country, thus very efficiently achieving analysis of the narrative of our history through cultural heritage objects.

Our founding parable accompanies this research. In the jungle of advanced information and the vast ocean of data, two mammals have the natural instincts to survive and succeed: jaguars and whales. The jaguar gives us speed and accuracy; the whale, depth and immersion. The merging of both natures leads us to innovation, and willing to make the jaguar as agile as the whale to traverse seas loaded with information, Jaguar Vision is born.

--

--