What if I need full control over my cluster size?
Apache Spark Structured Streaming deployed on Databricks is the perfect framework for running real-time workflows at scale. However, the Databricks jobs clusters use Optimized Autoscaling which can be somewhat aggressive for many 24–7 streaming workloads. Even though there are a few tuning parameters that can be used to slow down the default behavior, we can easily run into situations where we need tighter control over the cluster’s size mainly for cost-saving purposes.
For example, if the streaming query contains shuffle operations, e.g. when the query involves an aggregation or a…
Real-time machine learning inference at scale has become an essential part of modern applications. GumGum’s Verity engine powers the industry’s most sophisticated contextual targeting product by analyzing thousands of digital content every second around the clock. This is a challenging undertaking that requires deploying deep learning models using an event-driven streaming architecture on an elastic cloud-native cluster.
At GumGum, we use Apache Kafka’s high throughput and scalable streaming platform to connect various components of our machine learning pipelines. Up until recently, we deployed the underlying inference micro-services solely on Amazon ECS, which is a great choice due to its security…
At GumGum, we use Computer Vision (CV) to leverage page visuals for our contextual targeting and brand suitability product called Verity. We process millions of images every hour, and at this rate, our long-term inference costs dwarf the upfront training costs. So, we tackled this issue head-on. In this post, I’ll benchmark and highlight the importance of multi-threading for I/O operations and batch processing for inference. Note that implementing these strategies may be an overkill if your application’s scale is of the order of a few thousand images an hour.
Let’s look at our application components: