Conde Nast genius() project

Paolo Genta — Mon, 08 Oct 2018 13:24:37 GMT

This is an excerpt of the speech I had the opportunity to have at the last Amazon AWS Summit in Milano, during the Machine Learning session.

In Condé Nast Italy, we have wondered many times how to improve people’s browsing experience on our websites.

How can we use the growing AI tools to provide an experience based not only on captivating visual effects but providing content with a better possibility of being read?

CN.challange().engageUsers()

To try to achieve answers to the above questions, we reviewed our website’s traffic and identified 4 macro-areas:

Behaviors: understand the user’s behaviors on our websites, to better understand what users look like to read, how they move from an article to another. These type of analysis is for both logged or not logged users.
Clustering: Can we assign people to clusters of people, to understand how we can profile them?
Tailor made: content suggested for user’s real interest, better management of ADV formats based on navigation (how often you don’t even see an ADV because there is nothing to do with what you like?), personalized newsletters tools ( how often we trash newsletters just because of poor content? )
Prediction: tools to predict which is the best moment during the day to publish fresh content, based on past user’s navigation

CN.genius()

Based on these four interesting challenges, we build the Conde Nast genius() project, with the aim of adding intelligence and Italian geniality to the Condé Nast sites, in an evolutionary way.

CN.genius() logo

CN.genius() is, in fact, a work in progress which will allow us to move from a common news site to a complex platform that interacts with the user, serving content that is tailored made for him, using the latest cutting-edge technologies.

Evolution of CN.genius() project

CN.genius().showMobile()

What is in the real word CN.genius()? Here are two examples:

This red box appears on the desktop version of VanityFair website if you miss it for some time.
And the content suggested is based on your previous site navigation. Let me explain.

Do you read a lot of content about music, new album releases or news about your beloved artist?
The A.I. powered box suggest you the best article based on your taste.

On the mobile version of the site, instead, scrolling an article you can find these amazing cards that you can fold.

The cards are a mix of personal suggestion, based on your previous browsing history, and suggestion based on the top viewed articles of the same article’s category, in the last hour.

Which is the process that brings us to these kinds of visual widgets? Let’s start from the beginning, with the basics.

CN.analyze().overview()

A user interacts with VanityFair reading articles, browsing image galleries, searching something interesting for him: we can sum-up the interaction in the following high level / self-drawing diagram.

Attributes of a common article

Focusing on articles, the user experience is composed of a mix of text, images, videos, tags. Every user can browse the website following his own path: by category, for example, of following one particular author he really likes.

CN.analyze().overview().addAI()

Let’s add some A.I.

The set of attributes of a page are the basic ingredients for an A.I. engine that, by processing the data, can suggest to the user articles or navigation paths that may be related to him.

So we have the basic ingredients to let our user evolve , from a simple reader to a happy user!

A user is happy if he gets better suggestion about what to read.

Let’s see how Amazon’s AWS infrastructure helped us to implement data collection and how we used their different A.I. engines to create the CN.genius() experience.

CN.genius().user().idendity()

Let’s capture the clickstream data

We started from the user perspective, we need to identify our audience, in a strong and secure way, so we choose AWS Cognito.

AWS Cognito gives us a unique UUID, base on a combination of machine and browser.

So, different browsers mean different UUID that we can eventually merge if the user will register or log in our platform.

CN.genius().user().clickstream()

All information on the page — article/photo gallery/video, for now on the user’s clickstream, with the addition of the anonymous AWS Cognito token — is sent to an AWS Kinesis Data Streams that guarantees data ingestion without limits, matching the traffic’s peaks that every news site can have (e.g.: did you hear about the new royal wedding?)

The data now follow two paths: the first one move the data in a data lake, thanks to AWS Kinesis Firehose, for further analysis and for marketing department’s data visualization dashboards.

First clickstream path: from ingestion to S3 for further analysis and for marketing dashboards

CN.genius().suggestedArticles()

The second path is the real-time analysis of the data and their aggregation, thanks to the analysts performed by AWS Kinesis Data Analytics: for example, how many sports articles were seen in the last hour? How many news?
The analysis is done not a posteriori, but directly on the stream, that is persistent in AWS Kinesis Data Streams for 24 hours.

Clickstream is analyzed bt Kinesis Data Analytics.

This type of data is then saved in a data lake and inserted into the machine learning engine, installed on an AWS EC2 machine with an AMAZON Deep Learning AMI.
The results of the elaboration are then moved into an instance of elastic search service, which guarantees very high performance and therefore, through an AWS Lambda written in NodeJS and the related API gateways, returned to the front end of the site.

The elaborated date are returned to the users if for of suggested articles.

CN.genius().images()

But there’s more than articles and text.
Images are a great asset for Condé Nast: we have more than 10M images in our digital asset management tool, indeed.
Often, however, the images are not tagged correctly or even they don’t have any tag at all (lack of time, mainly)!

Try to image a photographer at a catwalk that needs to upload a bulk quantity of images as quick as he can, he simply doesn’t have time to tag every image. And this is a problem, we lose this remarkable piece of information.

Image analysis workflow

We have therefore developed an image analysis system, to automatically tag every image in our database and give new tools to our editors.
All our images and galleries of images are stored on an AWS S3 bucket.

Every time we store an image, an AWS NodeJS Lambda function is triggered and pass the image to AWS Rekognition, the Machine Learning service that detects objects in images.

The answer is a JSON that contains all the identified tags and a value that is the confidence of the tagging accuracy (e.g.: 90% is a high confidence, 30% a poor one)
If among all tags there are some keywords that identify a human being, like “face” or “woman”, the Lambda triggers again AWS Rekognition to understand if that face belongs to a celebrity.

The final JSON is stored in our managed RDS AWS Aurora and, on top of this flow, we created some serverless application to give new tools to our editor.

The serverless applications are a mix of single page application, written with the help of Vue.js and serverless backend AWS Lambda function written in Node.js.

Here’s the first three of them:

CN.genius().tagManager()

Depending on the quality of the pictures and the way a shot is taken, sometimes AWS Rekognition can fail: it can confuse some objects with other or it’s possible that cannot recognize some new celeb.

Tags found in a picture (light ones are removed)

So we created a tool to add or remove tags from an image, in a conservative way. We do not overwrite the original tags but we sum it with our new one, to maintain our database as clean as possible.

CN.genius().galleryManager()

Another application is the galleryManager(), a real gallery creator: we can create on the fly a gallery, simply searching tag like “handbag” or “skirt” or searching some celebrity, like “Scarlett Johansson” or again, search for some famous stylist, like “Giorgio Armani”

With a simple drag&drop, we can reorganize images or discard some of them (e.g.: we don’t want to show someone with an old fiance!) and then save the new gallery, that the Machine Learning engine can now suggest to users interested in that tag.

CN.genius().automaticGalleryManager()

The automaticGalleryManager() is an idea inherited from the previous galleryManager(). Every night we consolidate in a so-called ‘last items gallery’ the more recent images of every single celeb tagged in our RDS. CN.genius() can so sometimes serve to the users the latest gallery of his starred celeb.

There’s a lot of things left to do, mistakes to fix, new technology we can integrate into our workflow: a nice refactoring is on the right path at the moment…
Teaser: we’re working on a new Graph database, on a new Machine Learning engine and on some voice stuffs…

But I hope to show you better some of these changes in a not so far, next chapter…

Stories by Paolo Genta on Medium