Automatic Curation of Photo Memories through Unsupervised Learning

Manuel Costa
Storyo
Published in
14 min readMar 1, 2019

Introduction

Automatic discover of stories based on a device camera roll or a cloud backup system became an important strategy to increase return rate of apps dealing with photos and photo memories in particular. The discovery of these stories can follow different strategies and probably the most common way to do it today is through deep learning to classify photos based on their content or recaps like “best of 2018”. You can then create stories for photos classified as beach, sunset, flowers, with an endless set of categories or simply select the “best ones” according to some criteria and constraint it to a specific date span.

Another possible approach to search for stories is by trying to get closer to how our brain tends to organize memories. Stories are strongly related to space and time where the event occurred and temporal scale of our memory tends to increase in special occurrences like travel scenarios. On the other end, home events or moments happening quite frequently like going to an office of one’s company in a different city or even in a different country tend to be all messed up and are remembered as less relevant moments.

An additional aspect of automatic stories that is worth emphasizing is that they can be used to communicate naturally with users even when the app is not explicitly running. For example, if a relevant story is found, a user can be send a push notification and, thus, promote return rate and increase conversion.

When searching for stories on a mobile device one should also look at the degrees of freedom to execute them. Deep learning execution to be viable has to run on GPU or dedicated HW and it should be noted that iOS does not allow access to the GPU while the app is in background to avoid a possible impact on an app running in foreground and to avoid unexpected battery drain.

In the Storyo app and now in Storyo SDK we have been always focused on giving the best experience for final users by designing a story discovery strategy that best matches how people tend to remember moments and having in mind how the mobile operating systems work. In this article we will disclose the work that we have been doing to discover stories tuned based on the statistics that we collect and feedback received.

This article will start by describing the strategy used to create stories optimized for a mobile environment followed by the steps used in the recipe to produce a set of Smart Albums that can be transformed into stories. We then disclose some statistics collected through Storyo app 2.x for iOS and 1.x for Android and include some final considerations.

Finding Stories

At Storyo we have been always concerned with speed and designed our algorithms to provide a seamless experience for our users. Because they can represent an important entry point for customers and also an important path to increase loyalty, this search has been optimized throughout the years.

Before going into detail, we have to remember that photos include meta-information time and space if the user configured these data points to be active.

As we presented in “Automatic Photo Curation through Storytelling and Deep Learning, this information can be used to create the skeleton of a narrative for a preset of photos to be used in a curation flow of photos. Now, remember the way we organize memories in our brain is strongly related to where and when a moment happened and scale of memory changes if an event is part of daily life or if it is something happening in a new or unusual geography.

Again space and time can play an important role not only for narrative foundation but also in the curation of memories. We just need a way to organize data to follow people’s brain that is computationally efficient and is designed for the degrees of freedom available in a mobile world.

It is beneficial to emphasize once more that a memory can be a key gateway for your marketing communications. Execution in background can be quite interesting to identify a new set of photos, upload this fact to your server and later on push notify users promoting the visualization of a memory that can be converted into a sale.

It is relevant to note that iOS is the most restrictive operating system and you can request your app to run once in a while in background if you follow Apple’s rules. Remember again, though, that in iOS it is not possible to use GPU and / or specialized HW in background and that content analysis through deep learning needs it to do things in a reasonable amount of time.

For all this, we created a curation flow based in two fundamental and independent steps. First, we organize memories just based on meta-information space and time by using only CPU and later, when the app is in foreground, we leverage all of the HW power to create a story and do the curation flow as explained in “Automatic Photo Curation through Storytelling and Deep Learning” and, eventually, enriched with contextual data as illustrated in “Data-driven storytelling based on photo metadata”.

Now that we do have a strategy identified, we will explain our algorithmic cooking recipe that is designed to run 100% on the device and we will start by defining the concept of “frequent location” which plays a key rule in the memory curation flow.

Frequent location

Overview

Because they will be used as key reference regions, the concept of frequent location is crucial in our strategy to discover candidates for stories. It is important to note that a Frequent Location does not have to happen in user’s home. It can happen abroad if your user has offices in more than one country, for example, it can happen in a weekend house or perhaps it is simply a place that the user likes to go and take pictures frequently.

The major issue about these regions is that our brain starts messing up moments in these areas with a lot of occurrences and the way people overcome it is by reducing temporal memory and confine it to short moments. On the other end, if you are not in a frequent location like it happens while traveling on vacations to a new country during several days, you tend to remember it as a all.

Formal Definition

Now let’s formalize what a Frequent Location is in Storyo app and Storyo SDK. A Frequent Location is

  • a Space Cluster confined to a relatively small area computed based on the entire camera roll set;
  • with a Temporal Expression measured in different number of months above some threshold with at least one picture taken.

A few clustering methods can be used to segment photos in space clusters. Several methods use random factors in their flow, which can end up having different sets of clusters if executed twice on the same set of photos. To avoid it, we use same method applied to the narrative creation that produces always the same result (Mean Shift).

Because you can have very different scales like pictures in US, Europe and Asia and also neighborhood scale of user’s home, we do the clustering based on the two space variables hierarchically until the bounding box of a cluster or a sub-cluster is bellow some threshold. In the end we use leaf nodes as candidates to be a Frequent Location.

After finishing this clustering process, we have to define if candidate clusters respect time expression rules. In fact we do have two rules for Temporal Expression:

  • Time representation — It must have at least x different months (ex: jan 2017, Mar 2017, Apr 2107, Jun 2017 and Jan 2018)
  • Cyclic behaviour prevention — It must have at least z different months independently of the year (ex: Jan, Mar, Apr, Jun)

Of course that these rules can be always fine tuned or changed. The key point is that it uses a component to give a notion of time representation and a second one to prevent classifying travel destinations showing a cyclic behaviour like someone traveling on vacations to the same snowboard resort.

Finding Countries

To have a notion of where home country is to help us understanding our user and to help us predicting what is “abroad”, we will now find the country for each photo.

The most common way in mobile world to associate a geographic coordinate to a geographic label is by using reverse geolocation services offered by Google or Apple, respectively, or by consuming some third party service online. Independently of the service used, it implies that you will be dependent on a network access to do the task and that is something we want to avoid.

To steer clear of this network shortcoming, we do store the boundaries of all countries in the app bundle / Storyo SDK and just check if a geographic point is inside one of the polygons that compose a country (i.e., a polygon for the continental area, several for islands and, eventually, an enclave). Having these coordinates with a reasonable resolution can be carried out with approximately 2 MB of compressed data, which seems quite balanced for the advantages it brings to discover travel memories as it will be soon explained.

Home information

After computing Frequent Locations and associating a country to each photo, one can infer home information. Although it can be inaccurate, we can assume that user’s home is located in the Frequent Location with more expressive time representation.

Home country is also important as it will be used to infer travels abroad. We can assume again that home country is the one with more number of different months independently of the number of photos taken.

Travel

By their nature often associated with unusual and important events of user’s life, travel scenarios deserve a special attention as memories. Travel can happen in different scenarios and our search will use the following references:

  1. Travel, abroad — it can happen if the user lives in New York and travels to Europe where one or more countries are visited during that journey;
  2. Travel, domestic — living in New York and visit Yosemite park;
  3. Travel, domestic and abroad — living in New York and visit Montreal in Canada by car.

It is also important to emphasise that travel discovery uses both geographic and time references to split data and isolate travels. We assume that if there are pictures taken in a frequent place, pictures taken before and after do not belong to the same travel and we also assume that if a certain date span occurred between consecutive pictures, travel story is not the same (Fig. 1).

Figure 1 — Illustrative example of the criteria used to split photos to isolate travel candidates (frequent place and date span)

Travel Abroad — based on inferred Home Country

To find travels happening abroad we use home country as a first geographic reference and also Frequent Locations to prevent finding travels that are part of user’s daylife like having an office in a different country. The key steps to find this type of travels will now be specified.

Split photos by home country

1. Remove photos with geographic data — Filter out photos with no geographic data;

2. Find first travel candidates by splitting photos based on home country — Organize photos chronologically, go from the oldest to the newest and every time a photo or a sequential group of photos are flagged as home, use that gap between abroad photos to do a split and throwout home country photos;

Split photos by Frequent Location

3. For each group found in 2) do the same thing, but use frequent location as reference. If a photo or set of photos is found belonging to a Frequent Location you remove them and split the remaining by the gap created.

Split photos by date span

4. Finally, for each group found in 3) do again the same thing, but use now date span between consecutive photos and if it is above some threshold you split them.

Clusters of photos produced in step 4) represent independent unusual travel stories that are classified as Travel Abroad.

Relative to Frequent Locations

We will now follow a similar approach for travel stories based on Frequent Locations independently of the country. The key steps are as follows.

Split photos by Frequent Location

1. Remove photos with geographic data — Filter out photos with no geographic data;

2. Find first travel candidates by splitting photos based on Frequent Locations — Organize photos chronologically, go from the oldest to the newest and every time a photo or a sequential group of photos are flagged as Frequent Location, use that gap between non Frequent Location photos to do a split and throwout Frequent Location photos;

Split photos by date span

3. For each group found in 2) you will do the same thing but will use date span as reference. To prevent splitting too much you can weight this span based on the distance to the closest Frequent Location. If it is far away from a Frequent Location you use a larger span before doing the split and if it is closer the likeliness of being independent events increases and thus a smaller span can be used.

Filter out Travel candidates

4. Finally, we will filter out trial travel sets by a distance to the closest Frequent Location and also based on a time criteria.

Independent groups of photos are now independent travels obtained by using Frequent Locations as reference points.

Small moments

A simple and quite efficient way to organize photos in moments is by simply looking to the time span between consecutive photos and to the geographic distance between consecutive photos. If either time or geographic spans are above some threshold, you divide them in different groups.

Photo Classification

Now that we do have several organization perspectives, we will classify each photo based on previous groups where we will try to fill some blank spaces in photos with no geographic data. This classification can be used for different things, but we will just focus on how to use them to produce Smart Albums.

It is important to note that all previous organizations were based on photos stamped with geographic information, but “Small Moments”. This means that there will be no Frequent Locations nor Travel groups if all photos do not have geographic data.

It is also important to highlight that sometimes you can have several photos stamped with geographic data and others without it even if the user always used same Camera app. This is specially relevant in Android ecosystem as it will be later emphasised.

Note also that we used two different approaches to find Travel events and we did it to minimize inference errors by just considering a travel category that merges both grouping strategies.

If we organize camera roll photos chronologically in a spreadsheet with photos as rows and associated group as column for the scenario of a New Yorker going by car from New York to Montreal in Canada, you can have something like Table 1.

This example illustrates two possible situations that can happen often. The first is to have both travel classifications not matching in all records and second is to have photos without geographic data in the middle of photos with geographic information. To improve our inference in both situations we will start by adding another column to our table and do merge both travel classifiers. If a photo was classified as travel in at least one column, we do classify it as travel.

Our classification of individual photos ends up by inferring travel classification for photos without geographic data like photos 3 and 4, respectively. To do so, we will look to the start and end date of Travel Merged column and will flag all photos without geographic data taken within that date range with same travel classification id.

Smart Albums

Creation

After having the classification process finished we can create Smart Albums. To do so, we will look exclusively to Travel Merged column and Moments. First albums are identified by Travel Merged and later we look to the Moments Id column in the remaining photos. The creation process ends by organizing all albums chronologically.

The number of columns used in the classification can be extended with information like calendar events (ie, Christmas, Easter, etc) to enrich your UI or leverage the use of filters to segment albums based on a set of criteria that you can define according to the requirements of your app.

Figure 2 — Example of an UI showing Smart Albums where events
happening in a Travel context are slightly emphasized

Filtering out photos in Albums

Another important functionality that you can implement is the use of filters to filter out album photos. For example, Whatsapp App injects photos shared in chat channels into user’s camera roll that can just represent pollution in a Smart Album.

From Smart Album to a Story

Note that a Smart Album is just a set of photos that does not have any kind of computer vision or content analysis through deep learning yet. All the work so far is entirely based on photo meta-information.

To create a story in a video or a photo book formats you can trigger the curation flow explained in “Automatic Photo Curation through Storytelling and Deep Learning” and filter out undesired images, group similar and select the best ones. You can also use the elastic capacity explained in “Data-driven storytelling based on photo metadata” to transform an original set of 500 travel photos in a sharable one minute video format or in an affordable 50 page photo book all happening with an automatic flow.

Apple and Android geolocation in photos: a very different reality.

We will now share some insights based on data collected through Storyo Apps for Android and iOS that can help you in a decision process on how to use automatic discovery of stories.

Apple soon realised the importance of location stamped in pictures in the long term and have been promoting its use in the flow of installation of new devices and seems to do a quite good balance in GPS error and battery consumption. As expected, error tend to happen mostly indoor when no wifi is present.

Figure 3 — % of camera rolls with at least
one geo-tagged photo

On the other end, Android penetration of geo-tagged photos continue to be far more modest compared to its iOS counterpart. Storyo user base shows a penetration of photos stamped with geographic data above 75% of the camera rolls in iOS and in Android we do have near a 25 % penetration.

Even when camera rolls do have photos with geographic data stamped, they tend to have 50 % of photos with no geographic data in Android devices. This seems to be caused mostly by GPS / battery consumption optimizations and / or the error that Google allows photos to be stamped with latitude and longitude. For example, if you put Google Maps running in background on an Android device and start taking pictures, they will all end up having geographic data stamped unless you are indoor and no wifi is present.

Independently of the cause creating this disparity, we consider quite relevant for you to take decisions designing your solution with this reality in mind.

Conclusion

In this white paper we disclosed an algorithmic recipe used in Storyo SDK to curate photo memories based on space and time variability of photos present in user’s camera roll. The recipe is optimized for mobile environments and designed to be implemented CPU based to avoid background execution restrictions.

You can use automatically curated Albums as a gateway to communicate with your customers to foster loyalty and promote conversion where photos are mostly produced.

It’s important to note that the description in this and our previous articles “Automatic Photo Curation through Storytelling and Deep Learning” and “Data-driven storytelling based on photo metadata” represent an implementation based on the patent “Systems and methods for automatic narrative creation” Issued by USPTO Jun 12, 2018 (9,996,537) for automatic video creation.

Our company believes that data is a powerful ally in the creation of long-lasting memories and that technology can power products and services that better engage users in the re-discovery and celebration of their experiences through our SDK geared for photo printing partners.

For more information, please contact us at hello@storyoapp.com.

Next article: Deep learning strategies for content analysis photo curation.

--

--

Manuel Costa
Storyo
Editor for

CTO and cofounder of Storyo, an independent consultant in machine learning and computer vision (https://manuelc.net)