SnapLoc — Places of interest in a city, from geo-tagged photos

SNAPLOC is a product that I have created as a final project of a data science bootcamp, Metis. It does automatic image classification and spatio temporal analysis in order to recommend the places of interest for traveling in a new city. The idea came out of a need, as I find most of the current POI recommendation services very painful to use. I love to travel and whenever I have been to a new city and felt a need to know about the most interesting places to explore near me, I generally look up interesting photos on various photo services like Instagram or 500px.

I mean the choice is between -

“Hawk Hill is a 923-foot peak in the Marin Headlands, just north of the Golden Gate Bridge and across the Golden Gate strait from San Francisco, California. The hill is within the Golden Gate National Recreation Area”

and

Hawk Hill at sunrise (Photo Credit : David Yu)

If you are around Golden Gate, I highly recommend going to Hawk Hill at sunrise as you would see something similar to the pic above.

How did it all start?

I started looking at the pictures that people are taking and posting it on various services and saw that most of these pictures convey our interests and can be categorized into food, natural scenes/scapes, urban scenes, wildlife, birds, etc. Further, I could see a pattern of places where there were more pictures taken than others and there were different kinds of pictures taken at different locations and different times of the day. The question that I wanted to explore using these spatio-temporal patterns was, how to use this data to build an application that could figure out how to parse them and make recommendations based on the preferences of user. I also referred this paper to check the feasibility of the idea.

How is this application useful ?

  1. To recommend most interesting places based on user preferences. For eg: a small cupcake shop that everyone has been posting about but unless you search for it, you would never know about it.
  2. To the growing number of photo enthusiasts, I want to provide a way to see some great pictures of well known places. For eg. there is an awesome view of Golden Gate from Hawk’s point but many do not know that.
  3. And finally I want to take the location recommender a step further to suggest a travel itinerary for a new place that user wants to travel to. it is also a photo opportunity recommender and a travel itinerary recommender based on hotspots and landmarks.

Damn, how do you implement this awesome idea?

To accomplish this, I looked at some 1 million geo-tagged images on Flickr from Flickr API and classified them in common categories of interest. For first part of my pipeline, I trained a deep neural net of images, which would be able to classify any new image with high accuracy. Here the use of CNN also means that you can add more categories anytime in future and becomes an automatic classifier.

CNN Classification: I did a convolutional neural network with Inception V3 and got 0.81 accuracy on classifying these 6 classes. The benefit of doing the CNN here is I could add onto these categories, like what if I want to explore the location for adventures/activities that are going on around me and use transfer learning to be able to classify that category to make recommendation for that category to users. For classification, I referred this notebook.

I popped the last layer of trained inception V3 model and added my labels instead

model = inception-v3('/mnt/inception-v3_weights.h5')
model.layers.pop()
for layer in model.layers:
layer.trainable = False
model.outputs = [model.layers[-1].output]
model.layers[-1].outbound_nodes = []
model.add(Dense(6, activation='softmax'))

sgd = SGD(lr=1e-3, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])

For the second part, I took the location data of these images and did density based clustering which gave me hotspots in terms of geolocations. These are places along with their categories, that people are talking about a lot. After which I ranked these images based on number of user preference signals, popularity of photo, distance from user, as well as time of day and that’s how I created a personalized recommender of images based on category, location and time.

Spatial Clustering: For clustering, I used DBSCAN to come up with density based clusters with the help of latitude & longitude. Also, I had to convert these clusters into polygon to see the varying shapes and sizes of all my clusters, for which I used shapely and convex hulls. Finally, I did querying with KD-tree to get the nearest clusters and ranked these clusters based on user preferences, distance and time. This way I am able to provide personalized locations to users, relevant at that time. I referred the work for spatial cluster analysis from this github notebook.

I ran the DBSCAN clustering algorithm (DBSCAN helps removing outliers and bad clusters) for density clusters of image locations, you can see result in the map below.

#Imports required
from shapely.geometry import Point, Polygon
from geopy.distance import great_circle, vincenty
from sklearn.cluster import DBSCAN
from scipy.spatial import ConvexHull
# Spatial clusters based on the histogram
data = metadata[['latitude', 'longitude']]
db = DBSCAN(eps = 0.0007, min_samples = 8, metric ='euclidean', algorithm='auto')
db.fit(data)
# Visualization of clusters with shapely and geojson
coords = metadata.as_matrix(['latitude', 'longitude'])
cluster_labels = db.labels_
n_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(0, n_clusters)])
maploc1 = folium.Map(tiles='cartodbpositron', location=[40.678361, -74.019592],zoom_start=11)
for cluster in clusters:
if len(np.unique(cluster)) <= 2:
print ('bad cluster ' + str(cluster))
continue
inverted = [[x[1],x[0]] for x in cluster.tolist()]
ring = Polygon(inverted)
ring_hull = ring.convex_hull
folium.GeoJson(mapping(ring_hull)).add_to(maploc1)
#print(ring)
#print(mapping(ring_hull))
maploc1

How will the application look ?

I also made a demo of my application using Flask where you enter the latitude, longitude and time, on the first page you get to select the cluster of photos that you want to look at then clicking on that cluster, you can see the bunch of images in that category. As golden gate, here is other example that we can look at.

How could I miss out on these?

  1. If there was more time, I could have made the application more personalized based on user profiling and interests and give suggestions based on may be just the photography of wildlife.
  2. I can complete the itinerary creation for a new place — map of landmarks and hotspots that can be covered in a day based on current location and time a person wants to spend in that area.
  3. And who says that only Flickr’s geotagged images can be used? The whole world is open and I can take up tagged images from Instagram, Facebook and many other sites to do the same.

Here is the link to Github Repo for this project: