Improving Data Science Processes to Speed Innovation at Realtor.com

Published in

Realtor.com Innovation Blog

4 min readSep 2, 2021

Ben Litvinas, vice president, data science at Realtor.com

Realtor.com is one of the nation’s most popular home search destinations, with more than 100 million users visiting the site each month. With hundreds of thousands of property listings, it is also a data-driven organization. But like many others, the company grappled with how to operationalize machine learning algorithms at scale. And while the product team was focused on developing features that used complex algorithms, the company initially lacked a cultural mindset to leverage and utilize machine learning algorithms as building blocks to provide more personalized consumer experiences.

Identifying an opportunity to streamline processes

As part of a company-wide initiative to understand site users on a deeper level, the marketing team enlisted the data science team’s help. The marketing team created a list of 30 different predictive dimensions that they wanted to identify for each site visitor. In order to deliver on this request, the data science team would need to develop a large number of algorithms.Typically, it would take several months to complete each model, making the total time to complete the project too long to be viable.

This was the impetus to develop an acceleration framework, an internal project that leverages advanced data science and machine learning techniques, processes and tooling to create automated and scalable workflows for the data science team to access, aggregate and explore data. It also enabled the team to utilize that data to develop production-ready machine learning models exceptionally quickly.

Productionalizing data science models

When the data science team began the project, it would take between four and six months to deploy a productionized algorithm. The team realized that most of the data scientists’ time was spent not on machine learning, but on trying to wrangle together data from disparate sources and attempting to develop workflows that required multiple handoffs to implement an algorithm into production.

The incredible growth of the ecosystem of tools and shared best practices to build machine learning systems developed over the past 3–5 years enabled this project to take shape. These advancements have helped make even small teams productive at scale. For example, the team now utilizes Metaflow, which was open-sourced by Netflix in late 2019, which helps data scientists and engineers to build and manage projects, while leveraging the scale of the cloud.This means that many of the same core AWS services that already powered much of the team’s tech stack such as AWS Batch, Step Functions, and S3 now had a much more convenient workflow. Additionally, Metaflow provides tools to seamlessly track, manage, and collaboratively share data science projects across teams.

The output of the project was twofold — first, the team created a standardized access framework for data from multiple cloud platforms that manifests itself as a Python Library that any data scientist can easily access. Second, the initiative created a standardized process, infrastructure, and tooling for deploying machine learning algorithms in a directed acyclic graph (DAG) such that they become automated, scalable, and simple.

Faster results, better collaboration

The biggest advantages of the migration to the acceleration framework are time savings and increased collaboration. The team has shaved months off the time it takes to build a productionized machine learning model, and at the same time improved the overall collaboration of the data science organization with other departments within the company, particularly with engineering teams who were required to implement the data science team’s models.

Most notably, this has impacted the speed of the business. Previously, the data science team would be asked by the product or marketing teams to develop algorithms, only to complete development and realize that the business need that spurred the development of the algorithm had shifted.

By developing a repeatable, automated deployment methodology and tooling, the team was able to coordinate data science activities into the same agile development sprints that the rest of the organization uses, and at the same time focus on delivering measured, stepwise improvements to machine learning models.This enables the team to quickly develop a prototype algorithm, deploy it into market, gain feedback from users as to its efficacy, and iterate. Overall, this led to a better consumer experience, a more positive and collaborative environment within the data science organization, and ultimately a more effective use of our data in production.

Understanding consumer needs to develop a more personalized experience

To date, the team has built 21 different productionized machine learning algorithms leveraging the acceleration framework. These algorithms are focused on identifying and understanding consumers’ underlying preferences. Some examples of these algorithms include:

What is the intent of the visitor? Are they a dreamer who enjoys browsing properties, or are they an active searcher, who is looking to buy a house in the short-term?
Does the consumer currently have financing or do they need financing?
Does the consumer currently work with a real estate agent, or would they like to be matched with an agent in the network?

Machine learning is interwoven into many aspects of Realtor.com, from personalized searches and recommendations, to forecasting housing trends, to helping consumers find the best agent to help them along their home shopping journey. The acceleration framework helps power these machine learning applications, enabling the team to quickly iterate and deliver value to the business while improving the consumer experience.

Improving Data Science Processes to Speed Innovation at Realtor.com

Written by Nicole Murphy