Teaching Machines to See Beyond The Hashtag

Published in

Artificial Intelligence with a Vision

5 min readJan 12, 2017

The pixels are the source of truth.

This is a statement I find myself frequently sharing on calls with prospective customers learning about Clarifai’s new visual search and custom training products. We are a 40 person team here in New York focused on building solutions for organizations to fully understand and manage their visual content. These workflows range from tagging and categorization to searching and generating product recommendations.

The capture and analysis of user generated content continues to be a primary focus for consumer brands in almost every industry. Brand managers want to know how their message is being received and engaged with. Ecommerce operators want to include authentic and real world product shots within product pages and emails to lift conversion. News organizations want to quickly find media being shared from the ground at a developing event. Hospitality marketers are trying to ensure you feel inspired to share the essential and captivating moments of your trip.

UGC = u gotta content

2017: The primary challenge of being able to efficiently locate that image of a customer enjoying their new gadget or laying poolside at the cabana, is doing so efficiently at scale in a consistent manner. There will be more than 3 trillion photos shared this year.

The firehoses of today’s main social channels are incredibly noisy given the volume in an always-on connected world and the over reliance on existing hashtags for filtering and query building. #blogpost #thoughtleadership #deeplearning

The noise, spam and growth hacking associated with common searches today is a real point of friction for high quality analysis.

Here’s a look at #wine on Instagram.

and #pizza

My heart goes out to go all the analysts and interns at agencies, social listening tools and big brands that have to sift through this noisy deluge of content. And good luck to the produce companies navigating these types of searches without NSFW filtering.

Relying on hashtags as a final proxy for the underlying content is akin to relying on a quick glance of one’s resume to make a final determination of who they, what they are, and how they may be able to work with you.

Back to the original thought. The pixels are the source of the truth.

Sophisticated computer vision enables the pixels to speak for themselves. Quickly and accurately find the content your team needs regardless of the available user metadata, geo, descriptions and hashtags.

Want pizza? Let’s use our machine learning tags to search across a sample image library of 750,000 food items that contain no metadata, user #hashtags or descriptions. Just raw .jpegs

Want to find fine dining scenes that look just like this? Let’s use our search UI (powered by the same underlying API) to instantly surface visually similar images.

As you can see here, understanding the objects, colors, scenes and context within each image instantly allows you to locate the imagery important to tomorrow’s gallery, campaign and brand report. Try curating those results by hand. #nothappening

Because I am neither a growth hacker nor a 2011 Instagram user, I elected not to include much of a description on a recent visit to California’s beautiful Gold Coast.

A bit of General Tagging below and my photo automatically becomes significantly more valuable to a variety of parties. With image recognition, this photo is now actionable and worth indexing for analytics and social listening tools. More importantly, the subsequent end users can have an understanding of me as a consumer, visitor and traveler. Prior to this, this image was merely a black box that would have slipped past any search for object focused (#vineyard, #man), scene related (#landscape, #nature) and contextual (#leisure, #summer) related tags.

For the technically inclined, here’s how that response would look.

Moving beyond the hashtag consists of two primary technologies. # →

Search: Enable developers to easily tap into AI-powered search so they can browse, find, and recommend images by keyword, visual similarity, or a combination of both.

Custom Training: Filter, organize, predict and make recommendations with your own taxonomy. This taxonomy could be an pre-existing one or simply a new and niche request from a client. Within minutes of training via the UI or the API, users have a robust and private computer vision model specific to their own media. Streamline existing business workflows (moderation, metadata input, categorizing) and expose new search features with your own real time filters.

Digital asset managers, stock photography sites and social analytics tools can now expose fine grain, specific filtering for any type of request within their own product experiences. Being able to instantly view this content becomes as straightforward as including it within the standard menu options.

The last point I find myself emphasizing on each call is that this technology is real, it’s live and it’s ready for you to experiment, test and implement into a production environment. There are no pre-canned demos or Contact Us sales form in hopes of a call back within 48 hours. The reason we’ve built a real brand and business within this space is partly because of this fact:

If you are managing, ingesting, investing in and/or trying to understand visual content (images, gifs, video), I’d love to brainstorm and think aloud together as to what you can do. rdawidjan @ clarifai.com — as a bonus, I may just set you up with the ability to do all the above…with your own Instagram content.

The faster you see that we’ve taught machines to see, the better.

Teaching Machines to See Beyond The Hashtag

Written by Ryan Dawidjan