To Catch a Thief | The Juice

Zumo Labs presents The Juice, a weekly newsletter focused on computer vision problems (and sometimes just regular problems). Get it while it’s fresh.

Michael Stewart
Zumo Labs
3 min readMay 24, 2021

--

Week of May 17–21, 2021

____

Porch piracy has been on the rise, with 43 percent of American consumers reporting they had a package stolen in 2020. Pretty brazen box-nabbings, considering many folks were home full-time during the pandemic. But the simple fact is that folks can’t always be on the lookout for a package. A computer vision model, on the other hand, could be.

This week, Hugo trained a package detector on synthetic data exclusively, and he documented the process for our blog here. Readers will be rewarded with a picture of Gnocchi, Hugo’s cat. No spoilers, but it’s a good sign when the most time consuming part of the process is sourcing and labeling the test set.

____

#Surveillance

For the folks who won’t train their own package detection model, there’s the Ring ecosystem. What began as a clever video doorbell that failed to make a deal on Shark Tank, has since grown into an absolutely massive civilian surveillance network owned by Amazon. Even if you don’t mind your neighbors knowing all of your comings and goings, the thousands of partnerships Amazon has struck with local law enforcement may make you uneasy.

Amazon’s Ring is the largest civilian surveillance network the US has ever seen, via The Guardian.

#FacialRecognition

Speaking of Amazon and the boys in blue, the company has extended their moratorium on police use of their facial matching software Rekognition. It was originally due to expire in June, but continued scrutiny and activism has apparently led to an indefinite extension. Critics point out that the software is imperfect, and performs worse on people of color — especially those with darker skin tones.

Amazon extends moratorium on police use of facial recognition software, via Reuters.

#Bias

And in case there was any doubt big tech hasn’t fixed algorithmic bias, Twitter this week revealed the results of their in-depth review of their cropping algorithm. In short, Twitter’s “saliency algorithm” favors white people over black people. That’s bad enough, but their PR team’s misrepresentation of that margin of bias added insult to injury for some.

Twitter’s Photo Crop Algorithm Favors White Faces and Women, via Wired.

#Vintnernet

If you’ve made it this far, you could probably use a drink. How about a glass of wine? Cornell engineers have developed a computer vision-powered system that allows grape growers to predict their yields earlier in the season, and to do so much cheaper. The new tool allows them to walk (or apparently, golf-cart) through their vineyard while shooting video on a cellphone camera, which is then analyzed in the cloud.

Cheap, user-friendly smartphone app predicts vineyard yields, via The Cornell Chronicle.

#HardwareGap

“Evolution has managed to develop a neural architecture that can accomplish many tasks. Several studies have shown that our visual system can dynamically tune its sensitivities to the common. Creating computer vision systems that have this kind of flexibility remains a major challenge, however.”

Understanding the differences between biological and computer vision, via VentureBeat.

#ImageNetIdol

YouTube personality Yannic Kilcher wrote a song using lyrics sourced from ImageNet class labels. Then he used OpenAI’s CLIP model and BigGAN to generate a music video that syncs up with them. Turns out androids do dream, and it looks like this.

AI made this music video, via YouTube.

____

📄 Paper of the Week

Self-supervised object detection from audio-visual correspondence

Audio and video work quite well together, especially since there is so much video data available to train on. The group in this paper takes advantage of the shared signal to train an object detector entirely with self supervision. The model also builds a good internal representation of object classes (cat, airplane, instrument), and with just a single label can be aligned to ground-truth classes. One interesting extension of this paper would be to add the additional modality of text.

____

Think The Juice was worth the squeeze? Sign up here to receive it weekly.

--

--