How I Redesigned Contractor Search at BuildZoom

Silas Tsui
Engineering @ BuildZoom
6 min readAug 24, 2018
Image result for elasticsearch logo

Over the past four months, I’ve been working at BuildZoom, a construction marketplace that leverages building permit and license data to help connect homeowners with the best contractors for their project.

I had an amazing experience as a software engineering intern here. One of the best things about working at BuildZoom is how much they focus on mentorship and collaboration. I got to work on meaningful, impactful projects across the entire stack, ranging from data engineering and performance optimizations to communications infrastructure and full-stack product features. In this article, I’m going to talk about how I redesigned contractor search using Elasticsearch during my internship.

Introduction

ElasticSearch (ES) is a distributed, full-text search engine wrapped in an HTTP web interface. It’s an awesome tool for full-text search, relevancy ranking, and aggregations. At BuildZoom, we use it for things like contractor listing pages and project distribution.

If you haven’t used ES before, I would recommend checking out their documentation. It’s very comprehensive and a great starting point.

Redesigning Contractor Search

Listing Page for General Contractors in San Francisco, CA

Motivation

When users search for contractors on BuildZoom, we can roughly categorize them into one of two categories: exploration and search.

Exploration
Homeowners often come to BuildZoom looking for services (e.g. bathroom remodel, roofing, etc.) or types of contractors (e.g. electrician, general contractors, etc.). When a user is just browsing, we want to show results with contractors that are most qualified for their project.

Search
Contractors often search for their own profiles and homeowners often search for the contractors that BuildZoom matched them with . When a user is looking for a specific contractor, we want to show results with the contractor they’re looking for.

The search algorithm we want to use on the listing page varies depending on the use case, so we want to infer what type of search a user wants to perform based on their search terms. The most straightforward way to do this is to match their search terms against a list of contractor_types and project_types. If their search term matches, the user is probably making an exploratory search; otherwise, they’re probably making a full-text search.

# matches contractor type -> exploratory
search_term = 'electrician'
# matches project type -> exploratory
search_term = 'bathroom remodel'
# doesn't match contractor/project type -> fulltext
search_term = 'oaktown builders'

Architecture

Our architecture for Elasticsearch uses a service to convert front-end params from a controller to a standardized SearchConditions struct with all the available search configuration. The SearchConditions are then passed to ContractorElasticsearch, a module that translates attributes from the struct to Elasticsearch clauses. The resulting clauses are then combined together to construct our query.

This intermediate layer makes it easy for multiple controllers to interact with ES while keeping the code clean and maintainable. It also allows developers to construct ES queries without diving deep into Elasticsearch’s domain specific language.

Algorithm Design

I started by defining what attributes we wanted to consider in our search queries.

For exploration, we wanted to show off the best contractors on BuildZoom, so I prioritized attributes like BuildZoom score, photo quality, and relevant experience.

For full-text search, we wanted to display contractors with business names that closely matched the search query. It was important to create a string matching algorithm that worked really well.

Implementation

Creating the search algorithm was somewhat of an empirical process because I had to find a good balance between all the different attributes. I didn’t want any one attribute to dominate, but I still wanted some attributes to have more weight than others.

Bool Query

I used Elasticsearch’s bool query for this. It matches boolean combinations of other queries, making it perfect for ranking the relevance of different contractors. It takes attributes like buildzoom_score and string_match_scoreinto account to produce a single search_score.

# starting point for our bool querysearch_definition = {
query: {
bool: {
should: [],
must: [],
filter: [],
must_not: []
}
}
}

The filter and must_not clauses filter contractors without affecting thesearch_score. We can use this to include contractors in a specific geofence or exclude contractors with an alert.

# returns contractors within 50 miles of SF without an alertfilter: [
{ term: { has_alert: false } },
{ geo_distance: {
distance: 50,
unit: 'miles',
distance_type: 'arc',
location: {lat: 37.77, lon: 122.41}
} }
]

The must clause filters out contractors and affects the search_score. This is very useful for string matching. We can return contractors sorted by how closely they match the query string and filter out contractors that don’t match the search term all in one clause.

# returns contractors ranked by how closely they match the query terms must: [
{ match: {
business_name: {
query: 'oaktown builders',
fuzziness: :auto
}
} }
]

The should clause purely affects the search_score. We use this for things like photos and reviews, which provide a boost to the contractor’s ranking.

# provides a 2.5 point boost for contractors with photosshould: [
{ constant_score: {
filter: { term: { has_photos: true } },
boost: 2.5
} }
]

Fuzziness

String matching is heavily based on fuzziness. The fuzziness clause is the acceptable Levenshtein distance between the search term and the search results. If fuzziness is set to 1, searching for Silas could return results like Sylas or Silus.

Function Score

I primarily used the function score query to optimize string matching. The function score query has two components: queryand functions. I used query to return contractors that were somewhat similar to the search term, then used functions to rank the contractors based on how closely they matched.

Suppose a user was searching for “my way construction”. The function score query returns a list of contractors that fuzzy matched at least one of ['my', 'way', 'construction']. Then, functions categorizes the business name into one of four buckets:

  • exact business name match:my way construction
  • exact word match: my way general construction
  • fuzzy business name match: my bay construction
  • fuzzy word match: my bay electricians

Contractor search scores are multiplied by a factor depending on how closely they match. The closer the match, the higher the resulting search_score.

Analyzers

Analyzers are used to specify how we want to index data for different fields. When we index a contractor’s business name into ES, we remove redundant words, clean up symbols, and tokenize strings.

For example, if a contractor’s business name is “Dave & Silas Electrical Contractors”, it would get indexed as [“dave and silas electrical contractors”] and ["dave", "silas", "electrical"].

Results

Exploratory Queries

Exploratory queries still return contractors with relevant experience and high BuildZoom scores on a clean, aesthetically-pleasing page.

Before and After for “kitchen remodelers”

Full-text Queries

Full-text queries now show contractors that match the search terms much more closely.

Before and After for “my way construction”

Conclusion

Thank you for reading! I hope you were able to gain some insight into how you can leverage Elasticsearch for full-text search from this article. Elasticsearch is incredibly powerful and it definitely feels like I only scratched the surface of its full capabilities with this project.

If you’re interested in learning more or have any questions, feel free to contact me.

If you want to work on projects like this, BuildZoom is hiring for both internships and full-time positions!

--

--