Optimizing Search Index Generation using secondary cache

Bharat Kumar Venkat
Nov 4 · 5 min read

Bharat Venkat, Sravan Rekula, Hemadri Ananta

Introduction

To support the Walmart Search, a Full Index is generated periodically, and incremental updates are applied via real-time stream processing. Together they keep the Walmart search index current. The Full Index is implemented as a Spark based batch job, that does a full table scan on the underlying Item Store (Apache Cassandra). The requirement for Full Index generation was to capture the current state of the entire Walmart Item Catalog and that…

Keep the story going. Sign up for an extra free read.

You've completed your member preview for this month, but when you sign up for a free Medium account, you get one more story.
Already have an account? Sign in

Bharat Kumar Venkat

Written by

WalmartLabs

Using technology, data and design to change the way the world shops. Learn more about us - http://walmartlabs.com/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade