TensorRT: The Missing Piece for Computer Vision Models at Scale
Computer vision is one of the most popular fields in the realm of machine learning, and it has a wide range of applications that can be implemented at tiket.com. We currently have several computer vision modules that help us maintain the quality of our accommodation detail page, such as inappropriate content detection, scene detection, image similarity, image quality assessment (using both blur-bokeh and OCR models), and object detection. We named our computer vision project Aurora. It enables us to keep our platform clean and safe for our users. By using computer vision, we can automate many tasks that would otherwise be done manually, which saves us time and resources.
The Aurora project was launched in 2020 as we recognized the vast amount of image data that tiket.com had to manage. With millions of accommodations listed on our site, each with hundreds to thousands of images, it became clear that it would be impossible for humans to process or review all of these images individually.
In this article, we would like to share the challenges we faced in bringing our machine learning models to production and making them available for use by other teams. As an example, we will focus on one of the modules, scene detection, and discuss how we built the model, as we previously covered in this Medium article.
When the Aurora project was first initiated in 2020, we decided to use CPUs for image processing as they were able to meet the performance needs at that point. However, in 2022, we faced a new challenge when the Accommodation team requested to use Aurora for managing their massive room data, which included hundreds of millions of images, at least four times more than what we were currently processing. It became evident that our current approach would not be able to handle the increased volume of images, both in terms of processing time and cost. In light of this, we decided to adopt NVIDIA TensorRT, an open-source SDK for high-performance deep learning inference, which includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications. Using TensorRT allows developers to improve the performance of their deep learning models by increasing inference speed and reducing memory requirements.
We did benchmarking on our Scene Detection model (using EfficientNet B2) that is built with tensorflow. The performance comparison of the model inference was done using an E2 machine with 2 core vCPU, and 1 Tesla T4 GPU. Based on that, 3 scenarios were devised:
- Inference using CPU
- Inference using GPU with Tensorflow model
- Inference using GPU with TensorRT
Our benchmarking only considered the inference process, we didn’t benchmark the preprocess and postprocess part, because currently TensorRT only worked on our inference processes. 8141 images were used on 4 different batch sizes. Batch size represents how many images are processed at one time. There were 2 metrics considered:
- Average inference time per data
- GPU memory usage
As demonstrated in the chart, TensorRT GPU outperformed the Tensorflow model processed on both CPU and GPU, achieving up to 17 times faster performance for single image prediction. Tensorflow GPU performance is heavily dependent on the batch size. Increasing the batch size results in a significant improvement in performance. However, TensorRT GPU already performs better even with smaller batch sizes and becomes even faster when the batch size is increased, although not proportionally.
As shown in the benchmark results, the Tensorflow model using GPU consumed significantly more memory compared to TensorRT. This is because Tensorflow reserves a larger portion of memory resources, while TensorRT only allocates the necessary memory, allowing the remaining resources to be utilized by other processes.
In summary, TensorRT GPU has the capability to boost performance by up to 64 times when compared to using a CPU, and even 4 times more efficient than Tensorflow on GPU. With the growing volume of images that need to be processed, TensorRT in combination with GPU can be a highly viable solution for our company.
Stay tuned for our next in-depth benchmark results and implementation of TensorRT!! 😄