How Computer Vision Shines a Light in the Dark

A Q&A with Bill Cai, Data Scientist at One Concern

Published in

One Concern

5 min readAug 12, 2019

Traditional hazard models don’t work unless they use the highest-resolution data possible — an expensive and often impossible task. However, One Concern’s data scientists have developed novel AI techniques for correcting various data sources and scaling those insights across multiple domains. This empowers our platform to create precise impact assessments in places where never before possible.

Bill Cai is a computer vision expert who has engineered and applied these strategies to the data we’ve collected through Digital Anthropology. Today, he sat down to discuss his background and approach to technical challenges.

What do you do as a data scientist at One Concern?

The data science team is full of people from very diverse backgrounds, often experts from very specific domains. My background is in computer vision.

At One Concern, my first project was to process all these videos that Trigg, our Data Acquisition lead, took in Bogotá. He dumped a bunch of videos on me and basically said, “Do something with it.” So given all these data sets — Trigg tagged around 10,000 buildings — I asked, “How can I extrapolate this and apply it to one million buildings?” So we labeled a subset of these images with the building material, and then extrapolated that data and applied the model to millions of buildings in Bogotá.

Now, I’m working on a project called Resilience, which involves dependency modeling. That spans both data science and software engineering.

We know that not all catastrophe modelers use AI. Why are techniques like ML and computer vision so important to One Concern’s work?

There are two parts to hazard modeling where we apply AI. The first is hazard modeling itself: Given data on rainfall and elevation, what is the inundation level? The second is data generation: How do you get the weather data or building type in the first place? Only with good environmental data can you understand the impact of a flood, earthquake, or fire.

A big problem that all hazard modelers face is that there isn’t enough data. Even the best physical models (which don’t use machine learning) require the best clean data possible, and it often doesn’t exist. Many of the sources we have now, such as data on California’s wildfires, were only started in the last few years.

So Trigg and I created “the recycling bin of data.” We take all the data that some consider junk — like street photos and traffic cameras, which are all very widely available — and recycle it into something valuable. And to process large quantities of unstructured data, you need AI to extract the useful features.

Can you tell me about an especially interesting AI problem you’ve solved?

Right now, we know the building material for every building in the US, and almost every building in Japan. The question is: How can we leverage this data and extrapolate it to the rest of the world?

Our process is continually improving as we get more data, but the most popular approach is supervised learning. If you have tons of images of apples and oranges, this method classifies images as either apples or oranges. But if you go to a different country, you won’t just find apples and oranges — there might be pears, bananas, and other fruits you’ve never seen before. That was a challenge for us: when we went to Dhaka, we saw buildings that weren’t at all like the ones we’ve seen in Japan or the US.

As a result, we’re moving toward a semi-supervised approach. That allows us to think of it as a clustering problem instead of a classification problem. When we go to a new place and encounter new building materials, we don’t have to collect thousands of examples and completely retrain the model. We can create clusters instead, which is much more scalable.

What made you interested in working at One Concern?

For my on-site interview, I was at the office all day and got to meet every single person on the Data Science team. I was pretty tired by the end, but I had so many good conversations.

That made a difference. Beyond the technical work, I was also wondering whether these were people I wanted to spend most of my waking hours with. And I was glad that the people I interviewed with, and now work with, are people like that.

What experiences and background did you have before this role?

I finished my undergrad at UChicago, and then moved on to MIT doing computer vision research for my Master’s.

Something I really liked at MIT was the hands-on work with electronics and robotics. For context, my lab was making an autonomous boat, and I worked primarily on the cameras and navigation. But another job I had was to steer the “safety kayak.” I rode alongside the autonomous boat, and if it broke down halfway, I would be the one dragging it. Other times, I acted as an obstacle that the boat learned to navigate around. I got a free trip to Paris and Amsterdam this way [laughs].

You went to Amsterdam for your boat?

The autonomous boat is being made for the city of Amsterdam, and it’ll be deployed in three years’ time. The concept is actually pretty cool. The boat can be for people, but it’s also an autonomous dumpster — it’ll drag trash around. Or it can be a mobile bridge. If the city wants a bridge in an area, they’ll join together a swarm of these boats to form one. It’s dynamic infrastructure.

Finally, what does ‘resilience’ mean to you?

First, resilience means different things to different people.

For tech companies, it’s whether your customers can still access their products. For logistics companies, it’s the number of packages delivered. And for insurance, it’s the risk exposure they have.

Personally, I’m concerned with long-term resilience. After Hurricane Katrina, there were two wards right next to each other, both of which were flattened. But one bought insurance, and one didn’t. You can see right now that the ward that didn’t buy insurance is still 90 percent uninhabited, whereas the one that bought insurance has bounced back.

So to me, resilience is really about how communities can return to their long-term growth after disaster.

If you enjoyed this Q&A, keep an eye out for the rest of our Data Science Showcase in the following weeks!

Want to help us build planetary-scale resilience?
Check out careers at One Concern to see how you can help.

Trigg Hutchinson: Why invest in digital anthropology?