Sitemap
Walmart Global Tech Blog

We’re powering the next great retail disruption. Learn more about us — https://www.linkedin.com/company/walmartglobaltech/

Member-only story

Search Model Serving Using PyTorch and TorchServe

--

Search Model Serving GPU Metrics

Walmart Search has embarked on the journey of adopting Deep Learning in the search ecosystem to improve search relevance. For our pilot use case, we served the computationally intensive Bert Base model at runtime with an objective to achieve low latency and high throughput.

We built a highly scalable model serving platform to enable fast runtime inferencing using TorchServe for our evolving models. TorchServe provides the flexibility to support multiple executions.

Evolution

One monolithic Search Query Understanding application was responsible for understanding the user’s intent behind the search query. Through a single Java Virtual Machine (JVM)-hosted web application, it loaded and served multiple models. Experimental models were loaded onto the same query understanding application. These models were large, and computation was expensive.

With this approach, we faced the following limitations:

  • Inability to refresh a model with the latest version or add a new experimental model without full application deployment
  • Increased memory pressure on single application
  • Slow startup time
  • With concurrent model execution, the realized performance gain was minimal due to CPU limitations

--

--

Walmart Global Tech Blog
Walmart Global Tech Blog
Pankaj Takawale
Pankaj Takawale

Written by Pankaj Takawale

Leading Machine Learning Model Serving Platform & Query Understanding at Walmart Search. Passionate about engineering excellence on large Distributed Systems

No responses yet