Capstone Project Idea: Crowd-Sourced Vehicle-Inferred Economic Data

I’m thinking of proposing this as a product idea for our U.C. Berkeley capstone project. If we get enough data and enough interest, it can become a thesis paper. We would greatly value your views on this (saif.ahmed at berkeley.edu // 212.729.6544).

Firstly credit here goes to a friend for the idea — this is a variation on an idea he proposed which used Google Street view images.

The Concept: Create an a smartphone app which takes intermittent photos, runs inference on car models, and aggregates this up into location-centric wealth data. The simple explanation is…the more BMWs you see, the more wealth is in the area.

The technology is not difficult any longer. Car model datasets are available, so classifiers are easy to write. Neural network inference is easily portable onto mobile phones now, with full template apps available. This means we can focus on the much more interesting task of putting it all together and creating an aggregation model which infers wealth from the models it sees. We can also do interesting map-based visualizations. We can also do interesting statistical correlations to other indicators such as stale census data, political contribution data, rental rates, etc. to gauge the strength of our model.

The thought would be that we’d have lots of people contributing lots of data, so on aggregate, we’d be able to infer good data. If we run short on data, we can use other geo-tagged data as a temporary alternative, specifically, twitter posts with automotive images.

This would make a great product because low-latency economic data is always in high demand. Costs for such private datasets are sky-high (with most purchases by hedge funds), indicating high value, and clear demand-side use cases. Supply side use cases are more difficult but something out group could consider.

If we did this project we’d end up using many traditional data science techniques (regression, time series analysis) for model validation as well as cutting-edge data science (convolutional neural networks, resnets, object detection) for the aggregation model. Finally, we’d get to work with cool visualisations, especially map-based visualizations and a variety of datasets (Zillow Home Price Dataset, QUANDL, Census data, Twitter, Instagram.)

What do I (saif) bring to the table? I know how to train image models really well. I can set up a project to do so on cars pretty easily. I can also port such models decently well to Android phones. So some of the technical risk here is rectified. I’d love to find a group that is passionate about figuring out cool ways to build aggregator models by joining interesting datasets (automotive data, geographic data, demographic data.)

I’d also love to streamline (perhaps spinoff an open source project?) the entire object detection training+mobile inference process on a cluster. Google already offers this via TensorFlow but it is tedious, there is lots of opportunity here to make a mark just cleaning up Google templates and creating something turnkey and scalable.