·
Jan 23, 2023 Search Model Serving GPU MetricsWalmart Search has embarked on the journey of adopting Deep Learning in the search ecosystem to improve search relevance. For our pilot use case, we served the computationally intensive Bert Base model at runtime with an objective to achieve low latency and high throughput.
We built a highly scalable model serving platform to enable fast runtime inferencing using TorchServe for our evolving models. TorchServe provides the flexibility to support multiple executions.
EvolutionOne monolithic Search Query Understanding application was responsible for understanding the user’s intent behind the search query. Through a single Java Virtual Machine (JVM)-hosted web application, it loaded and served multiple models. Experimental models were loaded onto the same query understanding application. These models were large, and computation was expensive.
With this approach, we faced the following limitations:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4