How ElasticSearch Powering Search at HouseJoy
Startups are all about steep learning. You try several things but only few of those work, even fewer lead to good and impactful results. This happens on tech part of startups as well. So following is a story about how we learned a solution that would help us to improve our user experience.
We at HouseJoy are trying to solve a lot of fundamental problems in the home services market, such as accessibility, availability and reliability of quality service providers,as well as on-time arrival and high quality service fulfilment.
There are few over a dozen categories available on our platform and we have quite a number of server-instances processing such booking requests. Each step in the booking and service fulfilment produces a lot of data viz. on searched keywords , real time supply-demand match through slots management, route optimization to minimize the distance travelled by service providers per job, server-health monitoring and many more. We needed an engine to be able to process this data quickly and give required result set.While we typically use relational databases to store and search data pattern, but we needed something more powerful to search quickly on large dataset. Hence, we started to look at open-source frameworks like Sphinx, Lucene, Solr and ElasticSearch.
Using only Lucene is challenging because of its shortcomings like in areas of distributed support. Solr is also a good-solution for providing indexed search-engine over HTTP, but ElasticSearch proves to be a much superior distributed model with much ease of use. It provides full-text search functionality which is comparatively more efficient than Lucene. ElasticSearch helps to execute and combine many types of search and filter query. Due to this, we have identified few cases where we can extensively use ElasticSearch by providing rich experience to user and enhance internal operation activities.
Searching job details was completely handled by primary data-store MySQL. It is efficient in case of exact-value match, but our search was not restricted to only this. So we had to index the right columns in MySQL to make it fast, restrict the searches to a limited attributes.
As we started growing, we kept collecting a lot of data points to get meaningful patterns. Performing search on structured and unstructured data is a big challenge for us. For example, let’s take the case of booking details, it not only had booking data but also metadata related to the job and its respective service which includes
- Generation of sub-tasks of types execution, subscription, pick-up/drop, and so on.
- Automated service-provider allocation.
- Status transition models for the fulfillment of job.
- Geo-Location discovery model for on-ground service-providers.
- Quotation accept/reject flow.
- Payment Transaction data.
- Rating and Reviews.
Looking at the features in our product, it was clear that the datastore had to provide a lot more than just the basic existing filters. Features like recommendation-engine, user-profiling and segmenting, hub demand-supply modeling, etc., all of which required a completely new datastore.
A combination of ElasticSearch and Kibana turned out to be a perfect match for our use-case. ElasticSearch gives us the power of extensive-search over our data which when combined with Kibana, brought out visual and amazing insights. The kind of data visualizations provided by Kibana meant that we can phase out a lot of expensive analytics and reporting services that we were using. We’ve also used it for building internal dashboards which helps us monitor system-health, job fulfilment activity, etc.
Kiban helped us get started with creating visualizations over the data. It simplified the process with a few clicks and drags for a quick dashboard with all kinds of visualizations ranging from from time-series, heat-maps, geo-locations, graphs, charts and more. This is exactly what our Marketing and Product teams needed, a platform where they can easily pull out the data and represent it in a visual form.
Its simpleness to configure didn’t compromise any features. Kibana does a good job in abstracting out the complex part into a separate section called dev-tools which has a console where advanced users can create and test complex queries.
As powerful as it looks, Kibana will be of no use if your index does not have the datapoint. This is where Elasticsearch comes in. Although Elasticsearch proven to be schema-less, We would suggest building a schema for your data before actually indexing it. Elasticsearch does a good job in identifying most of the data-types, however there will be some fields which will have to be configured yourself. Now mapping all the fields would be a tedious task and so we created the mapping, and indexed the documents, which can queried.
Within this mapping there were fields that needed to be modified like :
- Dates that were identified as strings since it was in a different format
- Fields that should not be tokenized like city names
- Geolocation fields
We are providing search for Level-0, Level-1, Level-2 and Level-3 category-services. Each category(Level-0) will have sub-services(Level-1/Level-2) and each sub-service will have products(Level-2/Level-3), and product will have n-number of attributes. To perform search at any level is difficult if you are going with traditional databases. ElasticSearch accepts data in json format and maps individual fields in the document(object) into an indexed form.This indexed form is processed according to the field type. There are several internal data structures to make indexing and matching those indexed representations more efficient (the search engine part). The search happens over those indexed representations and the original representation is used to return content to the user.
We are also using edge_ngram autocomplete filter to get the details on documents. Below is the edge_ngram settings.
The edge_ngram tokenizer first breaks text down into words whenever it encounters one from a list of specified characters, then it emits N-grams of each word where the start of the N-gram is anchored to the beginning of the word.
First we use edge_ngram for auto complete, if we get results then we return the same. If we didn’t get any results, then we do fuzzy-search using same keyword. The fuzzy-search query generates all possible matching terms that are within the maximum edit distance specified in fuzziness and then checks the term dictionary to find out which of those generated terms actually exist in the index.
Below diagram shows how the client interface connect to search platform and how the search platform connect to elasticsearch engine.
Now we have configured Kibana over the ElasticSearch indexes and created dashboards to view the data. Next step would be to extend this platform to build a user-segmentation service and recommendation engine which can suggest users which services to book based on various factors like the data from previous bookings, current-trend, inventory availability, etc.
The next time you’re in Bangalore, please do visit our office if you happen pass nearby Domlur. We would love to hear from you and have a chat over coffee at our place on areas which can enhance our tech-solutions.
You can always bookmark us. We will be posting several tech articles in this series.