Optimizing Search Index Generation using secondary cache
Nov 4 · 5 min read
Bharat Venkat, Sravan Rekula, Hemadri Ananta
Introduction
To support the Walmart Search, a Full Index is generated periodically, and incremental updates are applied via real-time stream processing. Together they keep the Walmart search index current. The Full Index is implemented as a Spark based batch job, that does a full table scan on the underlying Item Store (Apache Cassandra). The requirement for Full Index generation was to capture the current state of the entire Walmart Item Catalog and that…


