Building Search System from Scratch

Bhagwati Malav
Hash#Include
Published in
3 min readSep 19, 2021

Search systems play a key role in most of the systems as it allows user to find given product in efficient manner. It is not easy to build such large scale systems in single iteration as it requires continuous improvement, optimisation, tweaking, building multiple additional systems to build better search.

Think about having millions of products in a catalog, and we need to show first n most relevant results to user. It is not easy to get these products in first result set as lot of factors come into play eg: personalisation attributes, relevancy attributes, performance attributes, click/ctr feedback loop etc along with keeping system highly available, reliable, fault-tolerant, elasticity, minimal response time.

The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. And we’re a long, long ways from that.
- Larry Page

I am starting this article series to continuously learn more about search systems, sharing my learning on various search products, cluster design, benchmarking, app design, rate limiting/rules to prevent abuse and overall architecture. We would focus on building/discussing basics system, please feel free to discuss/ give suggestions.

No system design is perfect, everything comes with pros and cons.

I would be writing on following in upcoming posts.

  1. Introduction to Search Problem Statement
  2. Designing ElasticSearch cluster from scratch
  3. Designing Search-As-You-type System
  4. Designing Autosuggest System
  5. Designing Search System
  6. Designing Search App (This is again a critical piece in any search system as system has to scale end to end..)

So lets get started with :

Introduction to Search Problem Statement ?

Lets start with “How do I design elastic search cluster from the scratch”

We are given with below given problem statement :

Design a global search experience for a super app where user should be able to search products across various verticals with minimal 99th response( < 50 ms)

We will use same problem statement in next set of articles.

We should always come up with must-have, good-to-have, optional attributes while designing any system.

Must-Have attributes (Basic):

  1. System must be highly available.
  2. System must have to respond within < 50 ms. No matter what architecture, tech stack, framework we use internally, if this fails, system is not serving the intended purpose.
  3. System must show most relevant results.
  4. System must support spelling correction.
  5. System must support understanding context/intent identification.
  6. Fault tolerant, reliable, elastic, secure system.
  7. Support to prevent system from public abuse, DDoS, traffics from bots/automated system along with rate limiting so that we don’t waste system resources, and can serve actual users.
  8. … Will keep updating it.

Good-to-Have attributes:

  1. System can have learn-to-ranking support(only make sense if system can provide relevant results). This allow relative ordering of search results.
  2. Personalised search experience.
  3. CTR(Click-through rate) loop feedback.
  4. Support to consider performance metrics of products in search ranking.
  5. Setup statistics system to view how search system internal component performs internally eg: intent/classification results, 0 results, non-zero results, spell correction correctness etc. This would help a lot in figuring out issues with internal components, and improving it.
  6. … Will keep updating it.

So this is what we can consider on minimal problem statement. Thanks for reading my article.

I would write on “How to design elastic search cluster from scratch” in next article along with discussing how to decide on master, data other nodes, capacity estimation, benchmarking, rate limiting at es cluster.

--

--