At Lyft we foster a culture of curiosity because we believe curiosity is the root of innovation and by following our inquiries we sometimes end up working on unconventional problems. This year at Moscone Center for WWDC we had one of those “what if” moments where just by articulating the question we’d be committing our next few days to the terminal.
Apple welcomed us in an imaginative and playful way. A wall of sentences pared down to the essence of apps but without removing poetry.
What if we could get the text of every sentence, in the right order and associated with the right color?
I asked and I’m not going to lie, I wasn’t trying to make the ephemeral endure, I was just wondering if it could be done before the conference was over.
This post will go over the steps I took to get this done but the tl;dr; is: it’s possible; go here and check it out.
I took ~50 (rather sloppy) photos of the wall and wrote a program to do image stitching using a cylindrical projection.
The overall challenge of this step is to recognize “things” that are similar between images to understand the transformations that need to be applied on each image so it “stitches” to the adjacent photos.
Find keypoints on all images’ corners
Keypoints are the locations that define what stands out in an image. The most important property of the algorithm that detects these points is its repeatability, this is: how reliably it finds these points under different viewing conditions. I ended up using the SURF algorithm for keypoint detection which is implemented on OpenCV.
Find the descriptor of each keypoint
For each point, the neighborhood can be analyzed and represented by a feature vector. These vectors should be distinctive and unique independently of the photometric deformations. This is: the algorithm should be able to find the same vectors regardless of the rotation/transformation of the image.
Match these vectors between adjacent images
The naive approach here would be to find the Euclidean distance between every feature vector and assume that the closest ones are our matches. This is a visualization of the results of this brute force approach:
Transform the image so it “stitches”
Based on the matching vectors we can “understand” the different transformations that the images have and apply another transformation to make the images “straight”. This can be done by estimating an homography matrix and extract the rotation and translation that we need to apply on each image, so these “sectors” match.
It turns out that openCV 3.0 supports all this out of the box. This is a c++ example of a program that would create a panorama for you. After some tweaking, this is what the output looked like:
Hello flat image
The goal now is to create a flat version of the panorama with no ghosting and no artifacts on image edges. This took some fine-tuning on the feature vectors from the previous step and some manual color corrections. This was the result:
Hello black and white & OCR
With the flat image and using OpenCV again I detected all the contours (letters) and created a (inverse) mask to remove the blue-ish background and only keep the letters. The goal here was to isolate the words so we can convert the image to black & white (or gray scale). Once there was no background, it was just a matter of defining a threshold and checking every pixel to decide if it’s going to be a black or white.
This image was already good enough to run tesseract. In order to get the best results I trained tesseract with the San Francisco Mono font before running it.
Now that I had all the sentences, the next step was to associate every sentence with a color and a position. In order to do that, I had to figure out how to:
- Find a list of contours in the image that are sorted from left to right but also line by line from top to bottom.
- Recognize sentences from those contours
- Find the most indicative color of the sentence
- Find the closest color from a fixed list of colors
My strategy was to find every contour (letter), cluster them by horizontal proximity (1 cluster per line) and sort the contours on these clusters by their x value.
With that, it was just a matter of iterating through every line and every contour in order, check its median color and merge it with the previous contour until a dot was found or when the distance between the colors was “too big”.
This is the script I used for this task.