Deep Learning and Google Street View Can Predict Neighborhood Politics from Parked Cars
It’s likely that your car says something about you. The make and model, whether it’s foreign or domestic, and how expensive it is can provide information about who owns it. This doesn’t work for everyone, of course, but over a large enough population, the statistics can be fairly reliable indicators. The United States spends a quarter billion dollars collecting socioeconomic information by hand through community surveys every year, but if there were a big enough database of what types of cars can be found in which neighborhoods, that data collection could be done more affordably, more frequently, and cover much larger areas. And there is a big database of neighborhood street pictures, in the form of Google Street View imagery.
Researchers from Stanford University have applied deep learning-based computer vision techniques to 50 million images across 200 regions to identify 22 million cars, which is roughly 8 percent of all automobiles in the United States. Based on the types of cars and their locations, the researchers estimated the income, race, education, and voting patterns of the people living in those areas. The results they derived from pictures are impressively accurate.
In principle, using a convolutional neural network (CNN) to identify cars in a street view image seems like a straightforward problem. However, in order to accurately estimate demographic statistics, it was necessary to know the make, model, year, and trim level for each vehicle. Many vehicles don’t change a whole heck of a lot from year to year, so to train the CNN, the researchers relied on both Mechanial Turk random humans, as well as car experts that they recruited on Craigslist. Ultimately, the CNN was trained well enough to classify vehicles in street view images into one of 2,657 categories, accounting for nearly every single visually distinct car, truck, and van sold in the United States since 1990. The CNN managed to chew through all 50 million images in two weeks with an accuracy of around 90 percent — a task which would have taken a trained human over 15 years to complete.
Once all the automobile data were collected, the researchers took demographic survey results and 2008 presidential election results for some sample areas and trained a relatively simple regression model to identify positive and negative associations between vehicles, demographics, and voting preferences.
Posted on 7wData.be.