A Convolutional Neural Network to Identify Nearest City in a Photo

Mike Curry
15 min readMar 31, 2020

--

A CNN for image recognition is trained on results from Google Images API for city names. The array of probabilities provided by a softmax output layer are examined to determine subtle similarities between cities. The plans for a “City Vibes” swiping app are here discussed.

This project originally started as a feature for iOS implementation of another app — Best Places to Live (link). In the third release cycle of our project, I teamed up with an iOS student to build out the feature described here. Fortunately for my partner, he was offered paid position midway through development. As a consequence, the front end of this feature has not yet been completed. Despite this setback, I have included my work on this feature as a stand-alone story in my project. I think the idea is fun, original, and would be fairly straight forward to develop. For the Machine Learning aspects of this project I have a proof of concept, and I have a skeleton outline of database architecture that would make this app function at a production level.

The overall idea of this project is to have a user swipe right or left on random images of cities. If the user likes what they see in the photo, the app should find cities that have photos with similar content. Likewise, the app should shuffle cities with disliked features towards the bottom of the list.

The beauty of using a Neural Network to target cities by random photos is that the patterns it picks up on are subtle. Of the general subject matter in a photo, the NN picks up on lighting, texture, edges, color-scheme, and honestly who knows what else. The patterns are not predetermined, but opaquely found by whatever the NN can pick up on in the photos to optimize it’s target accuracy. Adding to this subtly is how top results are selected from Google Images. The subject of photos in google images for a given keyword are continuously sorted by popular user association. For example, ‘New York’ generally has grandiose skyline photos towards the top of the stack. ‘Denver’ tends to feature mountainous backgrounds with rocky crags. ‘Los Angeles’ has more street view photos.

The duel-layer of subtly, both from the NN and google images automatic sorting, give rise to the double potential to access user sub-conscious preference in ‘vibes’ from a given city photo.

The slightly more technical vision of this project is to: train a neural network for image recognition on random photos of cities; have the model output an array of probabilities representing each photo’s match to each city; have a user swipe left or right for each random city photo displayed; keep a running score of probable matches for each city per user; output a list of top matched cities for a user based upon images they have swiped.

Steps completed thus far:

  1. Use a script to pull random google images into a file system, using city names as keywords.
  2. Design a CNN optimized for photo recognition, able to access file system, uses ‘city name’ as it’s target, and outputs an array of probabilities.
  3. Design a database architecture that store hosted photo URLs with output probability arrays, user running totals data, and a simple model for keeping users’ city match scores.

Starting with the end in mind, I first roughly sketched out a plan for a workable data architecture.

Emphasis on ‘rough’ diagram

The core of the data for this app can be stored in two relatively simple tables. The first table should contain information about each photo: a url to the hosted image; a queue position key value; an array of probabilities for the percentage match to each city. The second table should contain basic user information: their position in the photo queue, and the running match-score for each city. The second table could either live in the Back End or on the user’s device.

To complete the core of the app, a very simple model should update user match-score data. This model should take the current user score values and either add or subtract the current user input image array based on included Boolean values. Because of the simplicity of this model, it should live natively in the BE of the app and does not justify a separate DS API. Of course, this model could be greatly improved upon with only slightly more complex arithmetic to calculate the running total, but the general idea is there with just addition and subtraction.

The general flow through this diagram is:

The user initiated the app

  1. A request is sent to the User Swipe Data table for the queue position
  2. This key is used to access the next photo in the Photo Info table
  3. A fresh photo URL is served
  4. User input, left/right swipe plus PhotoID, is sent to BE model
  5. Model gets current swipe data and does arithmetic to update running total, updates user data table
  6. This cycle loops

A separate button for resulting recommendation

  1. Sends a request to User Swipe Data table
  2. Model sorts running totals
  3. Returns a list of top matched cities

THE SCRIPT:

Getting a script to pull photos into a file system from Google Images API was made very easy since this is something that has been done before. After a short couple of hours of search, I had something that was immediately adaptable to my purpose. This allowed me to download multiple images from google for each city and put them in seperate file folders all at once: https://github.com/hardikvasa/google-images-download

As a mock-up of what I expected to be able to do with my google images script, I started small with just nine photos between three cities. I manually downloaded these put the photos in folders in the same root directory as my project notebooks. Once I had my CNN working on this micro-set, I went to work on pulling photos from the google images API on a slightly larger scale. I was thrilled to have a directory of files with cities downloaded to my local machine.

Files created by script with photos in each

With my newly adopted script, I could fill my city folders with as many random city photos as I specified. Again, I took the road of simplicity and only did four photos for each of the 256 cities. Good thing I went small, because it took me several hours to download these with sad home internet speeds.

THE CNN:

I decided to force all of my photos to be the same size. Maybe there is another way of doing this, but figured that the input vectors were all going need to be the same when they got plugged into the Neural Net. For this I used a prepossessing tool from Keras. Again I started small with a resolution of 224 by 224 by 3 (for colors). I’m not sure if there is any reason for this size exactly, but I found that a lot of instances of Keras use this exact size so I superstitiously followed the apparent convention.

from tensorflow.keras.preprocessing import imagedef process_img_path(img_path):
return image.load_img(img_path, target_size=(224, 224))

Loading one of my photos to take a look:

image.load_img('./data/santa_fe/sfe1.jpg', target_size=(224,224))
Random Santa Fe image 224x224

Then I wrote a short function to apply this size requirement to each photo as it adds them as an array of values to a X vector.

data = []for i in ['columbus', 'portland', 'santa_fe']:
for file in os.listdir('./data/'+i):
# print(file) #to check if right files...
if file[-3:] == 'jpg':
path = os.path.join(f'./data/{i}/' + file)
img = process_img_path(path)
x = image.img_to_array(img)

data.append(x)
# print(x)

For fun, we can take a peak at our vectorized photos:

Vectorized Photo Array

Taking a quick look at the shape of this array, shows us that we have 9 photos, each being 224 by 224 pixles in 3 colors dimensions. Golden. Later the first number became 256 * n, the number of photos hosted per city.

On my first attempt at creating a functional CNN, I created a simple target vector by manually creating an array. Each number represented a city.

y = array([0, 0, 0, 1, 1, 1, 2, 2, 2])

For a bigger set of photos, I had to get a little more python savvy:

# Create target vectory = []
for i in range(0, 256):
j = [i]*4 # in this case 4 photos / city
for p in j:
y.append(p)
y = np.asarray(y)

Every time I write something this simple that works, I feel like a mad-scientist!

y.shape yields:

array([ 0, 0, 0, …, 255, 255, 255])

Perfect, except that it needed to be made binary:

from keras.utils import to_categorical
y_binary = to_categorical(y)

This turned my 0–255 numbered target vector into zeros and ones like:

array([[1., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 1.],
[0., 0., 0., ..., 0., 0., 1.],
[0., 0., 0., ..., 0., 0., 1.]], dtype=float32)

Next, I split my data into train and test. If I had a much bigger set, I would probably use a train/validate/test split, but this project primarily a proof of concept. Plus, I can’t afford the computational power to run anything much bigger than four photos per city.

from sklearn.model_selection import train_test_splitx_train, x_test, y_train, y_test = train_test_split(x, y_binary, test_size=0.25, stratify=y_binary)

In my “bigger” set with four photos for each 256 cities, I used three pics to train and one to test. Admittedly, this is not even remotely close to producing something that could be proven accurate, but that’s not really the point of this project.

I had method for clearly defining my X and y vectors, now I needed to work out a model.

I wanted to create a neural network that could take in my array of photos, use each city as a target, and output an array of probabilities that told how likely each photo was each city.

After a lot of searching and digging, I found a number of very helpful articles on building NNs with Keras from Jason Browlee — my current hero. https://machinelearningmastery.com/category/deep-learning/ Seriously, how is this guy so awesome!?!

I also found some helpful information about how get an probabilities from a NN using a softmax output layer and a categorical crossentropy loss function from Haihan Lan — another current hero of mine. https://towardsdatascience.com/the-softmax-function-neural-net-outputs-as-probabilities-and-ensemble-classifiers-9bd94d75932

After learning from these masters in my field, building my CNN model with a softmax output layer looks deceptively simple:

#imports
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, GlobalAveragePooling2D
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import Sequential, Model
#instantiate resnet50 & sew own a head -- input layer shape
res = ResNet50(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
# sew my tail on this beast
x1 = res.output
x1 = GlobalAveragePooling2D()(x1)
predictions = Dense(y_train[0].shape[0], activation='softmax')(x1)
model = Model(res.input, predictions)

Above, I’ve taken the beastly Resnet50 deep NN and simply Frankenstein on my own input and output layers. I’m not going to pretend that I understand how Resnet’s layer structure was dreamed up. However, I think it’s worth taking a moment to appreciate what’s under the hood on this model — prepare to scroll a ways. My model layers can be inspected using, model.summary:

Model: "model" __________________________________________________________________________________________________ Layer (type)                    Output Shape         Param #     Connected to                      ================================================================================================== input_1 (InputLayer)            [(None, 224, 224, 3) 0                                             __________________________________________________________________________________________________ conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_1[0][0]                     __________________________________________________________________________________________________ conv1_conv (Conv2D)             (None, 112, 112, 64) 9472        conv1_pad[0][0]                   __________________________________________________________________________________________________ conv1_bn (BatchNormalization)   (None, 112, 112, 64) 256         conv1_conv[0][0]                  __________________________________________________________________________________________________ conv1_relu (Activation)         (None, 112, 112, 64) 0           conv1_bn[0][0]                    __________________________________________________________________________________________________ pool1_pad (ZeroPadding2D)       (None, 114, 114, 64) 0           conv1_relu[0][0]                  __________________________________________________________________________________________________ pool1_pool (MaxPooling2D)       (None, 56, 56, 64)   0           pool1_pad[0][0]                   __________________________________________________________________________________________________ conv2_block1_1_conv (Conv2D)    (None, 56, 56, 64)   4160        pool1_pool[0][0]                  __________________________________________________________________________________________________ conv2_block1_1_bn (BatchNormali (None, 56, 56, 64)   256         conv2_block1_1_conv[0][0]         __________________________________________________________________________________________________ conv2_block1_1_relu (Activation (None, 56, 56, 64)   0           conv2_block1_1_bn[0][0]           __________________________________________________________________________________________________ conv2_block1_2_conv (Conv2D)    (None, 56, 56, 64)   36928       conv2_block1_1_relu[0][0]         __________________________________________________________________________________________________ conv2_block1_2_bn (BatchNormali (None, 56, 56, 64)   256         conv2_block1_2_conv[0][0]         __________________________________________________________________________________________________ conv2_block1_2_relu (Activation (None, 56, 56, 64)   0           conv2_block1_2_bn[0][0]           __________________________________________________________________________________________________ conv2_block1_0_conv (Conv2D)    (None, 56, 56, 256)  16640       pool1_pool[0][0]                  __________________________________________________________________________________________________ conv2_block1_3_conv (Conv2D)    (None, 56, 56, 256)  16640       conv2_block1_2_relu[0][0]         __________________________________________________________________________________________________ conv2_block1_0_bn (BatchNormali (None, 56, 56, 256)  1024        conv2_block1_0_conv[0][0]         __________________________________________________________________________________________________ conv2_block1_3_bn (BatchNormali (None, 56, 56, 256)  1024        conv2_block1_3_conv[0][0]         __________________________________________________________________________________________________ conv2_block1_add (Add)          (None, 56, 56, 256)  0           conv2_block1_0_bn[0][0]                                                                            conv2_block1_3_bn[0][0]           __________________________________________________________________________________________________ conv2_block1_out (Activation)   (None, 56, 56, 256)  0           conv2_block1_add[0][0]            __________________________________________________________________________________________________ conv2_block2_1_conv (Conv2D)    (None, 56, 56, 64)   16448       conv2_block1_out[0][0]            __________________________________________________________________________________________________ conv2_block2_1_bn (BatchNormali (None, 56, 56, 64)   256         conv2_block2_1_conv[0][0]         __________________________________________________________________________________________________ conv2_block2_1_relu (Activation (None, 56, 56, 64)   0           conv2_block2_1_bn[0][0]           __________________________________________________________________________________________________ conv2_block2_2_conv (Conv2D)    (None, 56, 56, 64)   36928       conv2_block2_1_relu[0][0]         __________________________________________________________________________________________________ conv2_block2_2_bn (BatchNormali (None, 56, 56, 64)   256         conv2_block2_2_conv[0][0]         __________________________________________________________________________________________________ conv2_block2_2_relu (Activation (None, 56, 56, 64)   0           conv2_block2_2_bn[0][0]           __________________________________________________________________________________________________ conv2_block2_3_conv (Conv2D)    (None, 56, 56, 256)  16640       conv2_block2_2_relu[0][0]         __________________________________________________________________________________________________ conv2_block2_3_bn (BatchNormali (None, 56, 56, 256)  1024        conv2_block2_3_conv[0][0]         __________________________________________________________________________________________________ conv2_block2_add (Add)          (None, 56, 56, 256)  0           conv2_block1_out[0][0]                                                                             conv2_block2_3_bn[0][0]           __________________________________________________________________________________________________ conv2_block2_out (Activation)   (None, 56, 56, 256)  0           conv2_block2_add[0][0]            __________________________________________________________________________________________________ conv2_block3_1_conv (Conv2D)    (None, 56, 56, 64)   16448       conv2_block2_out[0][0]            __________________________________________________________________________________________________ conv2_block3_1_bn (BatchNormali (None, 56, 56, 64)   256         conv2_block3_1_conv[0][0]         __________________________________________________________________________________________________ conv2_block3_1_relu (Activation (None, 56, 56, 64)   0           conv2_block3_1_bn[0][0]           __________________________________________________________________________________________________ conv2_block3_2_conv (Conv2D)    (None, 56, 56, 64)   36928       conv2_block3_1_relu[0][0]         __________________________________________________________________________________________________ conv2_block3_2_bn (BatchNormali (None, 56, 56, 64)   256         conv2_block3_2_conv[0][0]         __________________________________________________________________________________________________ conv2_block3_2_relu (Activation (None, 56, 56, 64)   0           conv2_block3_2_bn[0][0]           __________________________________________________________________________________________________ conv2_block3_3_conv (Conv2D)    (None, 56, 56, 256)  16640       conv2_block3_2_relu[0][0]         __________________________________________________________________________________________________ conv2_block3_3_bn (BatchNormali (None, 56, 56, 256)  1024        conv2_block3_3_conv[0][0]         __________________________________________________________________________________________________ conv2_block3_add (Add)          (None, 56, 56, 256)  0           conv2_block2_out[0][0]                                                                             conv2_block3_3_bn[0][0]           __________________________________________________________________________________________________ conv2_block3_out (Activation)   (None, 56, 56, 256)  0           conv2_block3_add[0][0]            __________________________________________________________________________________________________ conv3_block1_1_conv (Conv2D)    (None, 28, 28, 128)  32896       conv2_block3_out[0][0]            __________________________________________________________________________________________________ conv3_block1_1_bn (BatchNormali (None, 28, 28, 128)  512         conv3_block1_1_conv[0][0]         __________________________________________________________________________________________________ conv3_block1_1_relu (Activation (None, 28, 28, 128)  0           conv3_block1_1_bn[0][0]           __________________________________________________________________________________________________ conv3_block1_2_conv (Conv2D)    (None, 28, 28, 128)  147584      conv3_block1_1_relu[0][0]         __________________________________________________________________________________________________ conv3_block1_2_bn (BatchNormali (None, 28, 28, 128)  512         conv3_block1_2_conv[0][0]         __________________________________________________________________________________________________ conv3_block1_2_relu (Activation (None, 28, 28, 128)  0           conv3_block1_2_bn[0][0]           __________________________________________________________________________________________________ conv3_block1_0_conv (Conv2D)    (None, 28, 28, 512)  131584      conv2_block3_out[0][0]            __________________________________________________________________________________________________ conv3_block1_3_conv (Conv2D)    (None, 28, 28, 512)  66048       conv3_block1_2_relu[0][0]         __________________________________________________________________________________________________ conv3_block1_0_bn (BatchNormali (None, 28, 28, 512)  2048        conv3_block1_0_conv[0][0]         __________________________________________________________________________________________________ conv3_block1_3_bn (BatchNormali (None, 28, 28, 512)  2048        conv3_block1_3_conv[0][0]         __________________________________________________________________________________________________ conv3_block1_add (Add)          (None, 28, 28, 512)  0           conv3_block1_0_bn[0][0]                                                                            conv3_block1_3_bn[0][0]           __________________________________________________________________________________________________ conv3_block1_out (Activation)   (None, 28, 28, 512)  0           conv3_block1_add[0][0]            __________________________________________________________________________________________________ conv3_block2_1_conv (Conv2D)    (None, 28, 28, 128)  65664       conv3_block1_out[0][0]            __________________________________________________________________________________________________ conv3_block2_1_bn (BatchNormali (None, 28, 28, 128)  512         conv3_block2_1_conv[0][0]         __________________________________________________________________________________________________ conv3_block2_1_relu (Activation (None, 28, 28, 128)  0           conv3_block2_1_bn[0][0]           __________________________________________________________________________________________________ conv3_block2_2_conv (Conv2D)    (None, 28, 28, 128)  147584      conv3_block2_1_relu[0][0]         __________________________________________________________________________________________________ conv3_block2_2_bn (BatchNormali (None, 28, 28, 128)  512         conv3_block2_2_conv[0][0]         __________________________________________________________________________________________________ conv3_block2_2_relu (Activation (None, 28, 28, 128)  0           conv3_block2_2_bn[0][0]           __________________________________________________________________________________________________ conv3_block2_3_conv (Conv2D)    (None, 28, 28, 512)  66048       conv3_block2_2_relu[0][0]         __________________________________________________________________________________________________ conv3_block2_3_bn (BatchNormali (None, 28, 28, 512)  2048        conv3_block2_3_conv[0][0]         __________________________________________________________________________________________________ conv3_block2_add (Add)          (None, 28, 28, 512)  0           conv3_block1_out[0][0]                                                                             conv3_block2_3_bn[0][0]           __________________________________________________________________________________________________ conv3_block2_out (Activation)   (None, 28, 28, 512)  0           conv3_block2_add[0][0]            __________________________________________________________________________________________________ conv3_block3_1_conv (Conv2D)    (None, 28, 28, 128)  65664       conv3_block2_out[0][0]            __________________________________________________________________________________________________ conv3_block3_1_bn (BatchNormali (None, 28, 28, 128)  512         conv3_block3_1_conv[0][0]         __________________________________________________________________________________________________ conv3_block3_1_relu (Activation (None, 28, 28, 128)  0           conv3_block3_1_bn[0][0]           __________________________________________________________________________________________________ conv3_block3_2_conv (Conv2D)    (None, 28, 28, 128)  147584      conv3_block3_1_relu[0][0]         __________________________________________________________________________________________________ conv3_block3_2_bn (BatchNormali (None, 28, 28, 128)  512         conv3_block3_2_conv[0][0]         __________________________________________________________________________________________________ conv3_block3_2_relu (Activation (None, 28, 28, 128)  0           conv3_block3_2_bn[0][0]           __________________________________________________________________________________________________ conv3_block3_3_conv (Conv2D)    (None, 28, 28, 512)  66048       conv3_block3_2_relu[0][0]         __________________________________________________________________________________________________ conv3_block3_3_bn (BatchNormali (None, 28, 28, 512)  2048        conv3_block3_3_conv[0][0]         __________________________________________________________________________________________________ conv3_block3_add (Add)          (None, 28, 28, 512)  0           conv3_block2_out[0][0]                                                                             conv3_block3_3_bn[0][0]           __________________________________________________________________________________________________ conv3_block3_out (Activation)   (None, 28, 28, 512)  0           conv3_block3_add[0][0]            __________________________________________________________________________________________________ conv3_block4_1_conv (Conv2D)    (None, 28, 28, 128)  65664       conv3_block3_out[0][0]            __________________________________________________________________________________________________ conv3_block4_1_bn (BatchNormali (None, 28, 28, 128)  512         conv3_block4_1_conv[0][0]         __________________________________________________________________________________________________ conv3_block4_1_relu (Activation (None, 28, 28, 128)  0           conv3_block4_1_bn[0][0]           __________________________________________________________________________________________________ conv3_block4_2_conv (Conv2D)    (None, 28, 28, 128)  147584      conv3_block4_1_relu[0][0]         __________________________________________________________________________________________________ conv3_block4_2_bn (BatchNormali (None, 28, 28, 128)  512         conv3_block4_2_conv[0][0]         __________________________________________________________________________________________________ conv3_block4_2_relu (Activation (None, 28, 28, 128)  0           conv3_block4_2_bn[0][0]           __________________________________________________________________________________________________ conv3_block4_3_conv (Conv2D)    (None, 28, 28, 512)  66048       conv3_block4_2_relu[0][0]         __________________________________________________________________________________________________ conv3_block4_3_bn (BatchNormali (None, 28, 28, 512)  2048        conv3_block4_3_conv[0][0]         __________________________________________________________________________________________________ conv3_block4_add (Add)          (None, 28, 28, 512)  0           conv3_block3_out[0][0]                                                                             conv3_block4_3_bn[0][0]           __________________________________________________________________________________________________ conv3_block4_out (Activation)   (None, 28, 28, 512)  0           conv3_block4_add[0][0]            __________________________________________________________________________________________________ conv4_block1_1_conv (Conv2D)    (None, 14, 14, 256)  131328      conv3_block4_out[0][0]            __________________________________________________________________________________________________ conv4_block1_1_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block1_1_conv[0][0]         __________________________________________________________________________________________________ conv4_block1_1_relu (Activation (None, 14, 14, 256)  0           conv4_block1_1_bn[0][0]           __________________________________________________________________________________________________ conv4_block1_2_conv (Conv2D)    (None, 14, 14, 256)  590080      conv4_block1_1_relu[0][0]         __________________________________________________________________________________________________ conv4_block1_2_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block1_2_conv[0][0]         __________________________________________________________________________________________________ conv4_block1_2_relu (Activation (None, 14, 14, 256)  0           conv4_block1_2_bn[0][0]           __________________________________________________________________________________________________ conv4_block1_0_conv (Conv2D)    (None, 14, 14, 1024) 525312      conv3_block4_out[0][0]            __________________________________________________________________________________________________ conv4_block1_3_conv (Conv2D)    (None, 14, 14, 1024) 263168      conv4_block1_2_relu[0][0]         __________________________________________________________________________________________________ conv4_block1_0_bn (BatchNormali (None, 14, 14, 1024) 4096        conv4_block1_0_conv[0][0]         __________________________________________________________________________________________________ conv4_block1_3_bn (BatchNormali (None, 14, 14, 1024) 4096        conv4_block1_3_conv[0][0]         __________________________________________________________________________________________________ conv4_block1_add (Add)          (None, 14, 14, 1024) 0           conv4_block1_0_bn[0][0]                                                                            conv4_block1_3_bn[0][0]           __________________________________________________________________________________________________ conv4_block1_out (Activation)   (None, 14, 14, 1024) 0           conv4_block1_add[0][0]            __________________________________________________________________________________________________ conv4_block2_1_conv (Conv2D)    (None, 14, 14, 256)  262400      conv4_block1_out[0][0]            __________________________________________________________________________________________________ conv4_block2_1_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block2_1_conv[0][0]         __________________________________________________________________________________________________ conv4_block2_1_relu (Activation (None, 14, 14, 256)  0           conv4_block2_1_bn[0][0]           __________________________________________________________________________________________________ conv4_block2_2_conv (Conv2D)    (None, 14, 14, 256)  590080      conv4_block2_1_relu[0][0]         __________________________________________________________________________________________________ conv4_block2_2_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block2_2_conv[0][0]         __________________________________________________________________________________________________ conv4_block2_2_relu (Activation (None, 14, 14, 256)  0           conv4_block2_2_bn[0][0]           __________________________________________________________________________________________________ conv4_block2_3_conv (Conv2D)    (None, 14, 14, 1024) 263168      conv4_block2_2_relu[0][0]         __________________________________________________________________________________________________ conv4_block2_3_bn (BatchNormali (None, 14, 14, 1024) 4096        conv4_block2_3_conv[0][0]         __________________________________________________________________________________________________ conv4_block2_add (Add)          (None, 14, 14, 1024) 0           conv4_block1_out[0][0]                                                                             conv4_block2_3_bn[0][0]           __________________________________________________________________________________________________ conv4_block2_out (Activation)   (None, 14, 14, 1024) 0           conv4_block2_add[0][0]            __________________________________________________________________________________________________ conv4_block3_1_conv (Conv2D)    (None, 14, 14, 256)  262400      conv4_block2_out[0][0]            __________________________________________________________________________________________________ conv4_block3_1_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block3_1_conv[0][0]         __________________________________________________________________________________________________ conv4_block3_1_relu (Activation (None, 14, 14, 256)  0           conv4_block3_1_bn[0][0]           __________________________________________________________________________________________________ conv4_block3_2_conv (Conv2D)    (None, 14, 14, 256)  590080      conv4_block3_1_relu[0][0]         __________________________________________________________________________________________________ conv4_block3_2_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block3_2_conv[0][0]         __________________________________________________________________________________________________ conv4_block3_2_relu (Activation (None, 14, 14, 256)  0           conv4_block3_2_bn[0][0]           __________________________________________________________________________________________________ conv4_block3_3_conv (Conv2D)    (None, 14, 14, 1024) 263168      conv4_block3_2_relu[0][0]         __________________________________________________________________________________________________ conv4_block3_3_bn (BatchNormali (None, 14, 14, 1024) 4096        conv4_block3_3_conv[0][0]         __________________________________________________________________________________________________ conv4_block3_add (Add)          (None, 14, 14, 1024) 0           conv4_block2_out[0][0]                                                                             conv4_block3_3_bn[0][0]           __________________________________________________________________________________________________ conv4_block3_out (Activation)   (None, 14, 14, 1024) 0           conv4_block3_add[0][0]            __________________________________________________________________________________________________ conv4_block4_1_conv (Conv2D)    (None, 14, 14, 256)  262400      conv4_block3_out[0][0]            __________________________________________________________________________________________________ conv4_block4_1_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block4_1_conv[0][0]         __________________________________________________________________________________________________ conv4_block4_1_relu (Activation (None, 14, 14, 256)  0           conv4_block4_1_bn[0][0]           __________________________________________________________________________________________________ conv4_block4_2_conv (Conv2D)    (None, 14, 14, 256)  590080      conv4_block4_1_relu[0][0]         __________________________________________________________________________________________________ conv4_block4_2_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block4_2_conv[0][0]         __________________________________________________________________________________________________ conv4_block4_2_relu (Activation (None, 14, 14, 256)  0           conv4_block4_2_bn[0][0]           __________________________________________________________________________________________________ conv4_block4_3_conv (Conv2D)    (None, 14, 14, 1024) 263168      conv4_block4_2_relu[0][0]         __________________________________________________________________________________________________ conv4_block4_3_bn (BatchNormali (None, 14, 14, 1024) 4096        conv4_block4_3_conv[0][0]         __________________________________________________________________________________________________ conv4_block4_add (Add)          (None, 14, 14, 1024) 0           conv4_block3_out[0][0]                                                                             conv4_block4_3_bn[0][0]           __________________________________________________________________________________________________ conv4_block4_out (Activation)   (None, 14, 14, 1024) 0           conv4_block4_add[0][0]            __________________________________________________________________________________________________ conv4_block5_1_conv (Conv2D)    (None, 14, 14, 256)  262400      conv4_block4_out[0][0]            __________________________________________________________________________________________________ conv4_block5_1_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block5_1_conv[0][0]         __________________________________________________________________________________________________ conv4_block5_1_relu (Activation (None, 14, 14, 256)  0           conv4_block5_1_bn[0][0]           __________________________________________________________________________________________________ conv4_block5_2_conv (Conv2D)    (None, 14, 14, 256)  590080      conv4_block5_1_relu[0][0]         __________________________________________________________________________________________________ conv4_block5_2_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block5_2_conv[0][0]         __________________________________________________________________________________________________ conv4_block5_2_relu (Activation (None, 14, 14, 256)  0           conv4_block5_2_bn[0][0]           __________________________________________________________________________________________________ conv4_block5_3_conv (Conv2D)    (None, 14, 14, 1024) 263168      conv4_block5_2_relu[0][0]         __________________________________________________________________________________________________ conv4_block5_3_bn (BatchNormali (None, 14, 14, 1024) 4096        conv4_block5_3_conv[0][0]         __________________________________________________________________________________________________ conv4_block5_add (Add)          (None, 14, 14, 1024) 0           conv4_block4_out[0][0]                                                                             conv4_block5_3_bn[0][0]           __________________________________________________________________________________________________ conv4_block5_out (Activation)   (None, 14, 14, 1024) 0           conv4_block5_add[0][0]            __________________________________________________________________________________________________ conv4_block6_1_conv (Conv2D)    (None, 14, 14, 256)  262400      conv4_block5_out[0][0]            __________________________________________________________________________________________________ conv4_block6_1_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block6_1_conv[0][0]         __________________________________________________________________________________________________ conv4_block6_1_relu (Activation (None, 14, 14, 256)  0           conv4_block6_1_bn[0][0]           __________________________________________________________________________________________________ conv4_block6_2_conv (Conv2D)    (None, 14, 14, 256)  590080      conv4_block6_1_relu[0][0]         __________________________________________________________________________________________________ conv4_block6_2_bn (BatchNormali (None, 14, 14, 256)  1024        conv4_block6_2_conv[0][0]         __________________________________________________________________________________________________ conv4_block6_2_relu (Activation (None, 14, 14, 256)  0           conv4_block6_2_bn[0][0]           __________________________________________________________________________________________________ conv4_block6_3_conv (Conv2D)    (None, 14, 14, 1024) 263168      conv4_block6_2_relu[0][0]         __________________________________________________________________________________________________ conv4_block6_3_bn (BatchNormali (None, 14, 14, 1024) 4096        conv4_block6_3_conv[0][0]         __________________________________________________________________________________________________ conv4_block6_add (Add)          (None, 14, 14, 1024) 0           conv4_block5_out[0][0]                                                                             conv4_block6_3_bn[0][0]           __________________________________________________________________________________________________ conv4_block6_out (Activation)   (None, 14, 14, 1024) 0           conv4_block6_add[0][0]            __________________________________________________________________________________________________ conv5_block1_1_conv (Conv2D)    (None, 7, 7, 512)    524800      conv4_block6_out[0][0]            __________________________________________________________________________________________________ conv5_block1_1_bn (BatchNormali (None, 7, 7, 512)    2048        conv5_block1_1_conv[0][0]         __________________________________________________________________________________________________ conv5_block1_1_relu (Activation (None, 7, 7, 512)    0           conv5_block1_1_bn[0][0]           __________________________________________________________________________________________________ conv5_block1_2_conv (Conv2D)    (None, 7, 7, 512)    2359808     conv5_block1_1_relu[0][0]         __________________________________________________________________________________________________ conv5_block1_2_bn (BatchNormali (None, 7, 7, 512)    2048        conv5_block1_2_conv[0][0]         __________________________________________________________________________________________________ conv5_block1_2_relu (Activation (None, 7, 7, 512)    0           conv5_block1_2_bn[0][0]           __________________________________________________________________________________________________ conv5_block1_0_conv (Conv2D)    (None, 7, 7, 2048)   2099200     conv4_block6_out[0][0]            __________________________________________________________________________________________________ conv5_block1_3_conv (Conv2D)    (None, 7, 7, 2048)   1050624     conv5_block1_2_relu[0][0]         __________________________________________________________________________________________________ conv5_block1_0_bn (BatchNormali (None, 7, 7, 2048)   8192        conv5_block1_0_conv[0][0]         __________________________________________________________________________________________________ conv5_block1_3_bn (BatchNormali (None, 7, 7, 2048)   8192        conv5_block1_3_conv[0][0]         __________________________________________________________________________________________________ conv5_block1_add (Add)          (None, 7, 7, 2048)   0           conv5_block1_0_bn[0][0]                                                                            conv5_block1_3_bn[0][0]           __________________________________________________________________________________________________ conv5_block1_out (Activation)   (None, 7, 7, 2048)   0           conv5_block1_add[0][0]            __________________________________________________________________________________________________ conv5_block2_1_conv (Conv2D)    (None, 7, 7, 512)    1049088     conv5_block1_out[0][0]            __________________________________________________________________________________________________ conv5_block2_1_bn (BatchNormali (None, 7, 7, 512)    2048        conv5_block2_1_conv[0][0]         __________________________________________________________________________________________________ conv5_block2_1_relu (Activation (None, 7, 7, 512)    0           conv5_block2_1_bn[0][0]           __________________________________________________________________________________________________ conv5_block2_2_conv (Conv2D)    (None, 7, 7, 512)    2359808     conv5_block2_1_relu[0][0]         __________________________________________________________________________________________________ conv5_block2_2_bn (BatchNormali (None, 7, 7, 512)    2048        conv5_block2_2_conv[0][0]         __________________________________________________________________________________________________ conv5_block2_2_relu (Activation (None, 7, 7, 512)    0           conv5_block2_2_bn[0][0]           __________________________________________________________________________________________________ conv5_block2_3_conv (Conv2D)    (None, 7, 7, 2048)   1050624     conv5_block2_2_relu[0][0]         __________________________________________________________________________________________________ conv5_block2_3_bn (BatchNormali (None, 7, 7, 2048)   8192        conv5_block2_3_conv[0][0]         __________________________________________________________________________________________________ conv5_block2_add (Add)          (None, 7, 7, 2048)   0           conv5_block1_out[0][0]                                                                             conv5_block2_3_bn[0][0]           __________________________________________________________________________________________________ conv5_block2_out (Activation)   (None, 7, 7, 2048)   0           conv5_block2_add[0][0]            __________________________________________________________________________________________________ conv5_block3_1_conv (Conv2D)    (None, 7, 7, 512)    1049088     conv5_block2_out[0][0]            __________________________________________________________________________________________________ conv5_block3_1_bn (BatchNormali (None, 7, 7, 512)    2048        conv5_block3_1_conv[0][0]         __________________________________________________________________________________________________ conv5_block3_1_relu (Activation (None, 7, 7, 512)    0           conv5_block3_1_bn[0][0]           __________________________________________________________________________________________________ conv5_block3_2_conv (Conv2D)    (None, 7, 7, 512)    2359808     conv5_block3_1_relu[0][0]         __________________________________________________________________________________________________ conv5_block3_2_bn (BatchNormali (None, 7, 7, 512)    2048        conv5_block3_2_conv[0][0]         __________________________________________________________________________________________________ conv5_block3_2_relu (Activation (None, 7, 7, 512)    0           conv5_block3_2_bn[0][0]           __________________________________________________________________________________________________ conv5_block3_3_conv (Conv2D)    (None, 7, 7, 2048)   1050624     conv5_block3_2_relu[0][0]         __________________________________________________________________________________________________ conv5_block3_3_bn (BatchNormali (None, 7, 7, 2048)   8192        conv5_block3_3_conv[0][0]         __________________________________________________________________________________________________ conv5_block3_add (Add)          (None, 7, 7, 2048)   0           conv5_block2_out[0][0]                                                                             conv5_block3_3_bn[0][0]           __________________________________________________________________________________________________ conv5_block3_out (Activation)   (None, 7, 7, 2048)   0           conv5_block3_add[0][0]            __________________________________________________________________________________________________ global_average_pooling2d (Globa (None, 2048)         0           conv5_block3_out[0][0]            __________________________________________________________________________________________________ dense (Dense)                   (None, 256)          524544      global_average_pooling2d[0][0]    ================================================================================================== Total params: 24,112,256 Trainable params: 524,544 Non-trainable params: 23,587,712 __________________________________________________________________________________________________

Like I said, it’s a beast. Notice the last couple lines: over 24 million parameters, over a half millions of which are trainable! Mind boggling.

It feels a bit like cheating to add just a half-dozen lines of code to re-purpose it. The fifty layers are ResNet50 have been trained on millions of photos to identify all kinds of particular objects. Perhaps targeting cities captured in pictures is a wildly different affair, but it seems like a good place to start for picking up on subtle content of photos.

--

--