Images as the universal inputs

Ryan Dawidjan
Artificial Intelligence with a Vision
5 min readFeb 10, 2017

What happens when computer vision meets an Instagram account.

Ben thinking aloud just a few months prior.

Some smart people seem to agree and see a similar future. 👀 🤖

sharks with cameras > sharks with lasers

I agree too and here at Clarifai we’re teaching machines to see. We’re teaching them to see beyond the hashtag. We’re teaching them to understand visual content with a variety of lenses of the world around us — broad ones (General Model), content specific (Food 🌯, Travel ✈️️, etc), and even specific to your own context and taxonomy.

We ultimately believe the pixels are the source of truth and being able to understand the truth at scale, unlocks new product experiences and business workflows.

Demo Time

Back to Ben. Within this recent post, Cameras, ecommerce and machine learning, he explores the various implications of being able to understand images for what they truly are.

…thinking about what it might mean that images and video will become almost as transparent to computers as text has always been.

We should expect that every image ever taken can be searched or analyzed, and some kind of insight extracted, at massive scale.

At a high level, images can become a valuable and actionable structured data source rather than being stored as dumb black boxes in your database. You currently see this with the large players (FB, TWTR, APPL, GOOG, AMZN) processing and predicting on each image you send, upload, or view in a personal and professional context. The little guys and everyone in-between should have the same capabilities.

With repeatable and structured understanding as a baseline, images can be seen as inputs into complex systems like recommendation engines, discovery feeds, and ad profile development. Put more plainly, auth’ing into a service with your Pinterest or Instagram account should inform the respective application a lot more about you than your name and email.

At its core, training a network is about providing a collection of labeled inputs to product a predicted output. Concepts can be trained with images as straightforward as objects (shoe vs tie), as abstract as style (ryan’s apparel preferences), and as interesting as distinct patterns (flat lay photos).

Below is a quick example of what’s available today with our APIs and UIs as it relates to additional thoughts presented in the post. To make it a bit more concrete I used 535 images from Ben’s personal Instagram (public) account.

Search by tag

You could always search text for ‘dog’ but could never search pictures for a dog — now you’ll be able to do both, and, further, start to get some understanding of what might actually be happening.

Good thought, let’s search by concept using 11,000 of them (object, feeling, scenery, pattern, context).

Each photo is indexed with its predicted General Model tags so it can be surfaced and searched for later. With the API, a json response looks like this.

Understanding tags

What happens to ecommerce recommendations when a system might be able to infer things about your taste from your Instagram or Facebook photos, without needing tags or purchase history — when it can see your purchase history in your selfies?

Standard Instagram auth → name, username, profile description.

With image recognition, Ben’s account becomes an interest and psychographic profile for ‘book’, ‘architecture’, ‘car’, and ‘painting’ given the frequency and confidence of the predictions. A strong signal for content, products, and users you’d recommend him to explore within your new application.

Search visually

‘I know it when I see it’ type look. Here’s an image of a product, landscape, or object, find me more like this.

Similar content within Ben’s account.

Product recommendations from antiques retailer using an image of his.

Customize your understanding

Move beyond our taxonomy and train your own neural network in real-time to predict the presence of distinct, subjective, and context-specific imagery.

Here I created the concept of ‘ben_grid’ using just a handful of images as positive examples.

to automatically find these others.

I can continue to supply training examples to refine and instantly re-train the model.

Next Steps

We’ll continue serving professional developers and building technology (APIs, UIs, integrations) to enable this new reality. Ultimately, convolutional neural nets are just another powerful tool in the toolbox. We care more deeply about solving problems for developers, product owners, and businesses. If you think this could be the tool for you, there’s nothing holding you back. It’s live, real, and production ready. 💻 🖱️ ⌨️

I think Ben’s concluding thought is an astute one.

When we can turn images into data, we’ll find lots of sets of images that we never really thought of as data before, and lots of problems that didn’t look like image recognition problems.

We have uncovered A LOT of valuable applications for a wide variety of industries but we’re just as excited to uncover the next ones with our developer community, customers, and partners over the coming years. If you’d like to think aloud together and be a valued input, I’m all eyes and ears. rdawidjan @ clarifai.com

Watch the full visual exploration in the video below.

--

--

Ryan Dawidjan
Artificial Intelligence with a Vision

building NYC products and teams. // 🗣 w/ modern friends. big heuristic guy.