Making sense of 1 Billion image tags (part 1/2)

Published in

Netra Blog

5 min readMar 27, 2018

Netra’s visual intelligence APIs have analyzed and tagged over 250 million images to date and generated over 1B image tags

Image credit: Pixabay. Labeled for reuse.

We recently shared a joint case-study with our partners at Kantar TNS highlighting the value of using image recognition to measure brand consumption trends. One thing we didn’t discuss was how you can use raw image tags to extract insight.

We’ve been fortunate to work very closely with our partners to better understand how to analyze and interpret image tags generated by Netra’s APIs, and wanted to share a few operational pointers that may be helpful as you think about extracting insights from imagery:

Part 1 — Data Prep

Understand your data source

While image recognition can be applied to any image regardless of source, this study with Kantar TNS utilized social media imagery and we’ll only consider images from social for the rest of the article.

Social media can be full of noise; it’s swimming with bots, posts, and pictures with no valuable consumer signal. This isn’t a new problem, and recent events have led some social platforms to start thinking about the health of their social ecosystem, and perhaps a much-needed cleanup. Despite the noise, social media continues to be a rich, growing data pool full of signals on consumer preferences, trends, and other insights.

Filter out some of the noise BEFORE applying image recognition

Before processing images, you can remove a good chunk of noise by filtering out posts/images containing specific hashtags, combination of hashtags, frequency of posts, or volume of hashtags. For example, handles that post 50+ times a day or use 100+ hashtags per post may be more likely to be a bot account than a person.

The specific criteria you use will vary project by project, but spend some time thinking about your particular vertical and brainstorming different filtering criteria. For example, if you’re looking to analyze audiences who post about specific car brands, you may want to filter out known handles of automotive dealers.

You’ll still have some noise in your data, but the signal will be much stronger having cleaned the data up a bit prior to image analysis.

Filter out more noise AFTER applying image recognition

After the images have been tagged, you can use specific image tags to remove even more noise from your data set. Netra’s software tags images with context such as “advertising,” “font,” “text,” etc., and now that your image data is structured, you can easily filter out posts that contain these tags or any other image tags you want to remove from your analysis.

An example of a tweeted image tagged by Netra’s API tagged with **advertising**, **presentation**, **font…**

Analyze tags by %-of-images and %-of-users

Let’s say you’ve analyzed 1,000,000 images across 100,000 anonymous handles (10 images per handle on average). You’ll first want to convert tag counts to percentages — start with calculating %-of-images containing a given tag to produce something like this:

Fictional numbers used for demonstration purposes only

In the example above, if you were to just look at the %-of-images containing a certain logo, you may conclude that your overall audience has more of an affinity towards Smartwater than towards the other brands listed. But since you’re analyzing a collection of anonymous users, you’ll also want to consider the %-of-users in your analysis that have posted about each brand:

The combination of %-of-images and %-of-users tells a much different story. We can see that while a higher %-of-images are about Smartwater, this brand is shared visually by a much smaller %-of-users than Ben & Jerry’s and LA Lakers brands. While there is only a small faction within your audience that likes Smartwater, they’re super passionate about this brand as they post a ton of images (3X more image volume than Ben & Jerry’s) containing the logo.

Furthermore, there is a large segment in your audience that likes Ben & Jerry’s and another equally-sized segment that likes the LA Lakers, but the segment that posts about Ben & Jerry’s seems to be much more passionate about the brand, as there is 2X the number of images tagged with Ben & Jerry’s than there is of images tagged with LA Lakers.

Analyzing your data by %-of-images and %-of-users can help you identify which brands are most common across your audience and which brands are most talked about by your audience.

Benchmark the results

Now that you’ve converted your data into percentages you’ll need to compare it to a larger data set that is more representative of social data overall. If your source is Twitter, for example, you’ll want to compare percentages you calculated above to a benchmark percentage representative of a much larger population of Twitter users. At Netra, we have a database of millions of tagged social posts across platforms, and use this data to measure how much an audience over or under-indexes against posting images about a given brand, activity, or object compared to the rest of the social universe.

For example, if you spend any time on social media its no surprise that a large percentage of users on social post pictures of food, babies, and/or pets. In your analysis, these may be the most common objects/scenes that appear across all images and users.

By analyzing your tags against a larger benchmark, you’ll be able to pull out unique insights about your data not so obvious at first glance. For example, this analysis has way more pictures about sports cars and race tracks than the average social audience indicating they’re much more into car racing, performance cars, or NASCAR, and tend to be more dog-people rather than cat-people. Even though a large population of users are sharing pictures of pizza, this under-indexes relative to the benchmark.

Now you can start analyzing the data to extract insights.

Part 2-Bringing it all together (link)

Netra develops image and video recognition APIs to help enterprise structure and make sense of their visual media. Netra’s API ingests photo or video URLs and, within milliseconds, automatically tags it for visual content such as brand logos, objects, scenes, and people with demographic classification. If you’re interested in learning more, visit our website or say hello at info@netra.io !