Lester SolbakkeninTowards Data ScienceMachine-learned model serving at scaleMost ML libraries and platforms use all available CPU resources for single model inference. This breaks down for concurrent evaluations.Jan 4, 2022Jan 4, 2022
Lester SolbakkeninTowards Data ScienceUsing approximate nearest neighbor search in real world applicationsFrom text search and recommendation to ads and online dating, ANN search rarely works in isolationDec 17, 2020Dec 17, 2020
Lester SolbakkeninTowards Data ScienceStateful model serving: how we accelerate inference using ONNX RuntimeThere’s a difference between stateless and stateful machine-learned model serving.Dec 14, 2020Dec 14, 2020
Lester SolbakkeninTowards Data ScienceFrom research to production: scaling a state-of-the-art machine learning systemHow we implemented a production-ready question-answering application and reduced response time by more than two orders of magnitude.Nov 12, 2020Nov 12, 2020
Lester SolbakkeninTowards Data ScienceEfficient open-domain question-answering on Vespa.aiOpen-domain question-answering has emerged as a benchmark for measuring a system’s capability to read, represent, and retrieve knowledge.Oct 1, 20202Oct 1, 20202