RoboTagger: Using Google Vision for Fun and Profit

I like making photos and videos. I am also fairly pedantic about tagging those photos and videos with meaningful keywords in Adobe Lightroom so that I can find them later. Searching and sorting by dates and places only gets you so far when you have lots of content spanning several decades.
My problem is that I started making photos long before the advent of modern digital asset management (DAM) tools. As a result, I still have a sizable backlog of content that I’ve simply never gotten around to tagging. It would be awesome if Lightroom could analyze the content in those photos and automatically tag them for me, yet that is the one area where it is sorely lagging behind the state of art technology developed by other companies like Google and Facebook for their respective cloud services.
Much to my delight, the fine folks at Google recently rolled out their Vision API that does a marvelous job of image content analysis. It is incredibly simple to use: all you need to do is upload your photo to Google and after a second or two it will return detailed results of the analysis. There is a cost for each type of result, though I am only interested in descriptive labels and landmarks so the total cost would be only $3 per 1,000 photos. Google further enticed me with a free trial offer for up to 60 days with a $300 credit. Sweet!
Now, if only there was a way to harness the power of Google Vision to analyze my whole Lightroom catalog… without having to manually upload any photos.
I happen to have a particular set of skills. Skills I have acquired over a very long career. Skills that make me more than capable of solving this dilemma.
Lightroom has an SDK that enables anyone to write plug-ins to extend its functionality in many ways. I have developed several of them, both professionally and for personal use. I promptly set out to write a proof-of-concept plug-in that would do the following:
- Authenticate with the Google Cloud Platform
- Send a thumbnail of the selected photo to Google Vision API
- Present the user with the analysis results
- Tag the photo with the selected labels and landmark names
Presto! Done by lunch with time to spare for dessert. Piece of cake, indeed.
All I needed to do next was repeat those steps roughly 200,000 times. The only minor hiccup was that it takes around 1–2 seconds turnaround time per photo. I did not really want to wait 4–5 days of non-stop processing so I reduced the elapsed wall-time by making the plug-in issue multiple content analysis requests in parallel. I somewhat artificially limited that maximum to the number of CPU cores in your PC. I could have also combined multiple photos into larger batch requests, but that would have unnecessarily complicated the plug-in in exchange for relatively minor gains in performance. Google has not yet tagged me as a Denial of Service (DoS) attack.
Speaking of Google, according to my account dashboard I still have $287.95 and 43 days left in my free trial. Let ‘er rip!

If you are interested in this plug-in and more gory technical details, check out the github repository.
If you like this article, hit the “Like” button and tell your friends.
