Data Science Projects at Windfall

Cory Tucker
Apr 30, 2018 · 5 min read
Coefficient weights fitted for a customer model.

Our vision at Windfall Data is to make the most complete and accurate model of net worth. To accomplish this, we’re fusing a variety of different data points into a source of truth for consumer data. Data science plays a key role in achieving this goal at Windfall, and is responsible for improving the accuracy of our data sets, predictive models, and campaigns for customers.

Day-to-day data science at Windfall can span a variety of different types of projects. While the main goal is to continually improve the coverage and accuracy of our net worth models, accomplishing this actually involves a number of different types of projects:

  • Macro-Level Analysis: Building accurate models of net worth involves analyzing data from a variety from different sources including political contributions, stocks, and many others. Macro-level analysis is exploratory work focused on macroeconomic trends that we can apply.
  • Real Estate Modeling: We collect data from a variety of different sources and enrich these data sets with custom models. This includes building models that predict prices of mansions and apartment complexes.
  • Net Worth Modeling: Using our numerous sources of consumer data as input, these projects focus on building the most accurate estimate of net worth, and evaluate the impact of deploying new models.
  • Custom Modeling: In addition to providing net worth estimates, we also provide custom modeling services that help customers act on their data.
  • Ad-hoc Analysis: Given our company size, data science sometimes works on projects not directly related to exploratory analysis or modeling. For these projects, science provides product analytics support.

To focus the efforts of our data science team, we have a regular roadmap review process and daily syncs to coordinate with engineering. Data science plays a key role in product development, because our product is data. Our team not only helps guide the product, but also works on a wide variety of customer facing projects.

One of the primary focuses of data science at Windfall is to explore new data sources and determine if we can utilize these sources to improve the accuracy of our net worth models. We also use different data sources to validate the output of our models. Since we are trying to identify every affluent household in the US, we first need to be able to answer the question:

How many affluent households are there in the US?

To answer this question, we use publicly available data, such as census data, third-party data, and internal models. One of the data sets we previously explored was the Fed Survey of Consumer Finances. This type of analysis helps us build a baseline for understanding affluent households, but is not used as direct input for our models.

A factor map showing which asset classes are strong indicators of affluent households.

Macro-level analysis is useful for setting baselines, validating output at an aggregate level, and exploring questions for future product development. Some addition types of questions our data science team explores include:

How do changes in the stock market impact net worth?
How does an individual’s net worth change over time?

Answering these types of questions involves exploratory analysis, and usually provides input to other data science projects at Windfall.

Property ownership is a strong indicator of net worth, and we build predictive models to enhance our real estate data. We also use real estate data to validate other data sources and to build a more complete household profile.

One of the data science challenges we face is dealing with incomplete and missing data. This means some properties may be missing square footage, last sale price, taxable value, or other important attributes. We’ve built an in-house AVM (automated valuation model) to handle properties that do not include estimated prices, such as mansions, apartment complexes, and other types of property outside of single-family homes. Validating estimates for these types of properties often requires additional research.

A heatmap showing differences in housing prices based on location.

Real estate is just one example of the many data sources we enhance with predictive models to provide more complete and accurate data as input to our net worth models.

Data science at Windfall is responsible for improving the coverage and accuracy of our net worth models. We use an active learning process to continually improve the model, which includes the following iterative steps:

  • Defining metrics for tracking model accuracy
  • Performing feature extraction on our source data sets
  • Exploring new algorithms and model fitting methodologies
  • Evaluating the impact of updates to models
  • Productizing updates to the models
  • Validating model output prior to customer deployment

At Windfall, data science is responsible for putting models into production.

Establishing intervals for net worth predictions.

In addition to providing useful data to our customers, data science also builds custom models for internal customer scoring and profiling.This can include building segmentation models with clustering or look-alike models with supervised learning. For custom modeling projects, we identify how to label households for modeling, train offline models, and evaluate the performance of different approaches before making recommendations to customers.

Training a Logistic Regression model with regularization

Once a custom model has been deployed, data science is responsible for determining the impact of the campaign. This is similar to A/B testing, and we measure not only the change in key metrics, but also the statistical significance of these changes.

An additional function of the data science team is to perform deep dives as necessary. This often involves performing visualization work or other analysis, such as performing attribution of a marketing or fundraising campaign. Many of these tasks can be templated, and we will build additional tools to automate common tasks as we scale our team.

A visualization of where affluent households are located in the continental US.

These are just some of the types of data science projects that we are working on at Windfall. If these projects sound interesting, we’re growing our data science and engineering teams.

Windfall

Insights from Windfall

Cory Tucker

Written by

CTO and Co-Founder of @WindfallData

Windfall

Windfall

Insights from Windfall

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade