Using AI to Discover Fixer-Upper Deals

9 min readDec 13, 2023

This article was produced as part of the final project for Harvard’s AC215 Fall 2023 course.
Team: Tony Hua, Faisal Karim, Michael Leung, Yaseen Mohmand, Alvaro Ramirez, Evan Wan
Teaching Staff: Pavlos Protopapas, Rashmi Banthia
Project Github Repo: https://github.com/tonyhua18/AC215_rehab_image_detection_ai
Video: https://youtu.be/GhD88KlUKAE

Introduction

In 2020, with $170k saved for a downpayment, I was poised to finally fulfill my family’s dream of homeownership. As renters, my immigrant family had always seen home ownership as a cornerstone of building generational wealth, a way to stop “paying someone else’s mortgage.” Little did I know, the fears of a pandemic-caused recession quickly resided to unveil a real estate market frenzy. In the greater San Francisco Bay Area, homes began selling for $100K — $200K over the asking price, often to all-cash buyers. With a downpayment covering merely 17% of the average $1M home price in the Bay Area, my goal now seemed increasingly out of reach.

Fast forward to 2023, and the situation hasn’t eased. The real estate landscape is still dominated by unaffordable sky-high prices, fueled by soaring interest rates and a low supply of properties. Amidst this, I pivoted to an alternate buying strategy: fixer-uppers. Searching through Facebook groups and calling real estate wholesalers, I witnessed the potential in these sometimes overlooked properties. Fixer upper homes were more affordable than new builds and offered high margins for instant equity upon renovation. But, as I reached out to sellers, I again found myself competing with 20, 30, 60, sometimes over 100 other seasoned investors with ALL CASH.

The challenge became clear: How could someone–not already well-connected to the real estate investor community–discover these fixer-upper deals before everyone else?

Problem Statement

From experience, the competitive and unaffordable nature of today’s real estate market made finding reasonably priced homes a daunting task. Fixer-uppers, while offering a unique opportunity, are hard to come by. They get snapped up quickly by experienced, well-connected investors, and platforms like Zillow and Redfin do not provide easy ways to find them. This leads to a laborious process of searching by zip code and manually reviewing listings. And by the time a property is easily discoverable through keyword searches, it’s often purposely labeled by sellers and already on the radar of numerous investors.

Proposed Solution

In my Harvard graduate class (shoutout to Harvard AC215!) , I connected with a remarkable team who shared a passion for applying technology to real-world challenges. Together, our combined expertise in computer science and machine learning became the catalyst for developing a novel approach to identifying potential fixer-upper properties.

Our team’s solution to the challenge was a computer vision AI model, designed to autonomously scan through property images and identify potential fixer-uppers. As a possible future plugin for real estate marketplaces, our predicted labels are overlaid to marketplace search results, and can be easily accessed via filters by homebuyers. As part of our class project, we also developed a fully automated, scalable, and production-level pipeline for our AI model.

As a disclaimer, our approach may seem unnecessarily complex and overkill but our team’s primary aim was to maximize our learning process–it is a class project after all. If you’re intrigued by our solution and want to learn more, feel free to reach out via email. We’re excited to share our journey and discuss the potential of this technology in transforming the way we find real estate opportunities!

The following sections delve into the methods our team employed to construct our AI pipeline, the obstacles we overcame, the valuable insights we gained, and our collective journey in creating something innovative. We hope you find our story both informative and enjoyable!

Technologies Used

For our AI pipeline, we outlined the following solution architecture and deployment plan. The sections below will dive into each component more closely.

Two workflows for ML OPs teams. Datascientists can easily train and manage models while collaborating seamlessly with software developers to deliver the final working product to customers.

Data Collection and Preprocessing

We began by searching for existing datasets of fixer-upper homes but, as expected, found none that met our needs. Consequently, we turned to web scraping, using Python and BeautifulSoup4 to gather 13,783 images of renovated homes and 8,716 images of fixer-uppers from Craigslist. For automation, scraped images were automatically stored in Google Cloud Storage buckets and a simple command line interface (CLI) was built as a user interface for our scraper. We containerized our scraping tool with Docker and hosted it on Google’s Compute Engine as a standalone microservice.

In the preprocessing stage, we developed two more microservices. One microservice was label studio, a web interface to simplify and manage our dataset labeling pipeline if needed. Our second microservice resized images to 224x224 pixels with 3 color channels, compressed them using TFrecords, and used DVC to version control the dataset in preparation for CNN model training. We also utilized Dask to parallelize workflows, enhancing our tool’s speed and scalability. Again, CLIs were built for these two microservices as well.

Model Training

For model training, we employed Google’s Vertex AI for orchestrating on-demand custom training jobs and earlier stages of our pipeline, optimizing time and budget. With the architecture out of the way, we experimented with various CNN models for transfer learning, including VGG16, MobileNetV2, EfficientNet-b0, and EfficientNet-b7. Our baseline model, the original VGG16 with 3 additional custom layers, achieved a validation accuracy of 69% but was relatively large (162.15 MB size) and slow for production use (~13 minutes to execute).

To address this, we applied techniques like knowledge distillation to create a smaller & faster custom model based on VGG16. This reduced our model size to 0.04 MB (from 162.15 MB) and the inference time to 11.4 seconds, with a slight trade-off in accuracy. We also integrated WandB for tracking metrics and managing model weights, facilitating automatic hyperparameter sweeps, and auto model updates to our web application.

To further improve the compression of the model, we also experimented with another technique in knowledge distillation: online distillation (aka Deep Mutual Learning). The technique first appeared in Zhang’s original publication, which described how two simple student models learned from each other. Both student models were untrained and small in size by design.

Its advantages are multi-folded:

It does not require the existence of a large & trained teacher model.
It takes much shorter distillation time than the regular distillation (teacher-student).
It even outperforms the regular distillation in accuracy performance for the same number of epochs.

Student-student model (student_DML_2) outperformed the regular distillation (student_distill) from 69.35% to 69.91%. It also reduced distillation time from 17.3 min to 0.5 min.

Deployment

In the final phase of our project, we developed an automated and scalable architecture for deploying our Django backend API and React frontend web application. Our CI/CD pipeline, orchestrated by GitHub Actions and Ansible playbooks, automated testing and deployment to ensure consistency across environments.

We also containerized all our micro services with Docker for a swift and repeatable deployment process. Kubernetes clusters were employed for orchestrating these containers, enhancing our application’s scalability and management. Load balancers distribute incoming traffic, ensuring responsiveness, while NGINX reverse proxies manage external access, enhancing security and traffic flow.

This integrated setup, combining GitHub Actions, Ansible, Docker, Kubernetes, load balancers, and reverse proxies, streamlines our development process. It allows seamless integration of changes into our testing and production environments, significantly reducing manual intervention and boosting the efficiency, reliability, and scalability of our application deployment.

Challenges and Lessons Learned

Through this journey, we learned alot from some very difficult obstacles we encountered along the way. Two challenges stood out, requiring extreme levels of troubleshooting and debugging to make work. They are disucssed below:

Challenge 1: Low accuracy from noisy dataset as training inputs

The current training dataset is scrapped from the real estate listing websites. Images retrieved from a query using the keyword fixer are not always consistent. For instance, a property with a cluttered living room was listed by a sales agent as a fixer property. It is not a good training example for the neural network to learn from. Rather, a good training input image would have features like: boarded-up windows, or stud-exposed walls.

We definitely violated the principle that: “Having a good quality dataset is better than having the best algorithm in an AI model”. We did not realize how important it is until we saw it for ourselves.

Resolution:

To improve the accuracy, we experimented with various CNN architectures for a foundation model. We studied VGG16, EfficientNet-b0, and EfficientNet-b7 primarily. None of them stood out, with VGG16 slightly better. This further indicates the importance of quality data.

Alternatively, we could manually re-labeled each input image. Obviously it’s a prohibitively daunting task. So we did not pursue it. It would be interesting to know how much improvement would be resulted for a given amount of labels corrected.

We also applied data augmentation, a standard regularization technique, to reduce variance errors. Keras has great documentation, which includes code examples to facilitate the usage of their libraries. We mimicked their data augmentation code for our purpose here.

Challenge 2: Inference was too slow for a production model

Our first baseline model, deployed end to end with our full stack application took over 30 seconds to return a page of property results to our website visitors. This was unacceptable, as users would assume the webpage was not loading after only 5 seconds of a blank screen. We implemented many changes to eventually reduce our results to under a second for some cases, and a maximum of 6 seconds for more complex web pages.

To start, our inference model required approximately 0.9 seconds to process each image. Individually, this duration was manageable, but it became significant when considering the batch of 300 images, leading to a lengthy total processing time.

To address this bottleneck, we optimized our workflow with several strategies:

by implementing parallel processing, to perform inferences on multiple images in parallel
By streaming our data, and loading results to users as soon as individual images were predicted as opposed to batch processing of an entire page
adding telemetry to our websites to monitor performance also help debug and set baselines for our team as well
Using quantization, pruning, and distillation techniques to produce smaller sized “student” models
Deploying load balancing and multi-pod architectures to scale our inferring usage as needed

Closing Thoughts

Overall, we had an amazing time working on this project. We all agreed that we learned a lot over the past few weeks and are extremely proud of the pipeline we have created. Several of us are going to continue working on this application, continually extending it as well! To truly share our experiences unfiltered, here are some direct quotes from the team:

I learned a lot, was exposed to new software, and really appreciate the end-to-end nature, unlike other classes where you learn just a small component, models are not stuck in a Jupyter notebook, built something useful, fast-paced… — Tony Hua
“ For a hardware person who lives with clock cycles, logic gates, or die sizes for many years, this class is an eye-opening experience into state-of-the-art AI application software development. Now I got a first-person view from the driver seat about React, Kubernetes, Docker containers, etc. I had absolutely no clue before when I heard about them from my SW colleagues. But I am now empowered, and equipped with lots of powerful tools in my AI/ML toolbox. Thank you Harvard. Thank you Pavlos.” — Michael
“The thing I loved the most about the class was the time spent collaborating with my colleagues.” — Yaseen

With that, we thank you all for reading this lengthy blog post about our work, our goals, our challenges, and our progress so far. Again, we look forward to working and deploying the full application in the future so let us know if you’d like to keep updated! Once again, a huge thank you as well to the entire teaching team and all the wonderful colleagues and friends we’ve met along the way!