For Highly Interactive Apps, Prediction Is Not Enough!

Eugene Wu
thewulab
Published in
7 min readAug 28, 2020

This blog post introduces our recent VLDB 2020 paper, titled Khameleon: Continuous Prefetch for Interactive Data Applications. This is joint work with my Ph.D. student Haneen Mohammed, Cornell Ph.D student Ziyun Wei, and Ravi Netravali at UCLA. Haneen will be presenting this work in the Human-In-The-Loop track of VLDB at 6:00PM EST on 9/3.

Data-intensive interactive applications (e.g., data visualizations, image exploration, and even web applications) are increasingly shifting to cloud-based architectures. Although this provides massive and scalable server-side resources, it also introduces the network as a major bottleneck towards interactivity. Both round-trip latencies as well as limited network bandwidths (think using cafe wifi, or when traveling abroad) can cause user-level response times to spike.

As an example, consider the image exploration application below. The user can hover (the white circle that moves) over any of the 10,000 pixel-sized thumbnails on the left to see a full size version of the image on the right (each image is 1–2Mb). The user can easily generate bursts of 30+ requests per second, which can overwhelm the network and cause severe congestion. The chart on the right compares the AT&T LTE network bandwidth (red) with the actual bandwidth demand from a user when interacting with the application. This is what causes the significant delays seen in the GIF below. Notice how the images update rapidly after the mouse stops on the left — this is because the images finally arrived long after the mouse has moved on!

(Left) Image exploration application exhibits significant response delays due to network congestion. (Right) User bandwidth demand far exceeds the actual LTE network capacity.

Prefetching masks communication delays by trading bandwidth now, for higher cache hit rates later.

To do so, the client predicts what the user will do next, and proactively issues those requests. The hope is that by the time the user makes the next request, that response has arrived and been cached.

This crucially assumes that the predicted prefetch request was what the user later asked for. And thus, the bulk of prefetching work is on improving the predictor by deeply understanding user interaction patterns, limiting the user’s action space, and feature engineering. 70–80% accuracy is considered very good for a modern prefetching predictor.

Sadly, the basic premise — trading bandwidth for responsiveness — breaks down for interactive applications.

The combination of large response sizes, and bursty requests due to high interactivity, is already more than the network can handle. For example, an LTE network is ~5Mb/s, so it only has the capacity for 2–3 images per second, as compared to 30+ requests per second that users can generate. Further, the point of interactive applications is to give users more freedom to interact, which ultimately makes prediction much more challenging.

To summarize the reasons why interactive applications are so challenging:

  1. User interactions (dragging a slider, moving a mouse) are coupled with the number of requests sent to the server. Simply put, more user interactions → more requests → more data returned by the server → more network congestion.
  2. During user interactions, there’s almost no time between consecutive requests to perform prefetching. Even if the prediction was perfect, by the time the prefetched response arrives, the user has already moved on.

All of this is to say:

For Interactive Applications, Prediction is Not Enough

Continuous Prefetch with Khameleon

We designed Khameleon as a client-server communication layer for interactive applications. The application can specify custom predictors, progressive encoding schemes (see below), scheduling prioritizes, bandwidth thresholds, and more. Khameleon takes care of the actual communication, prefetching, and cache management, and lets the developer focus on improving the application itself.

The framework is designed with the following principles in mind.

  1. Decouple user interactions from network requests. User requests are registered with the client cache. The moment the cache can respond, it sends an upcall to the application. In this way, the client never sends explicit requests to the server (including prefetching requests), and the server is never forced to return stale data.
  2. Avoid network congestion. To do so, the server continuously pushes likely response data to the client at the network’s capacity (or a specified threshold). This turns communication into a scheduling problem of what responses to push next.
  3. Predictions as scheduling hints. Without any information, the scheduler simply degrades to random choice. The client can periodically send its predictions of what the user might do in the future, which lets the scheduler allocate more resources to more likely responses.
  4. Progressive encoding to control response quality. There’s not much wiggle room when the network is 5Mb/s and responses are 2Mb. Thus, crucial to our design, is that responses are progressively encoded (online or offline) into sequences of blocks, such that any prefix is good enough to render.

The following are examples of progressive encoding. On the left, we illustrate JPEG encoding: 20kb renders a lower resolution black & white image, an additional 20kb adds color, and the full image is 2Mb. Progressive encodings have also been developed for database query processing, time series, visualizations, and web applications; developing effective encodings for Khameleon is an exciting future direction.

Progressive encoding splits a response into a sequence of blocks, so that any prefix of blocks can be rendered. (Left) JPEG progressively encodes images. (Right) Examples of progressive encoding for visualizations and database queries.

Progressive encoding, when combined with scheduling, provides a huge space of options. If the server has no clue what the user may do, it can aggressively hedge by sending one block for every possible request. On the other hand, if it knows exactly what the user will want, it can commit to sending the full response. We formalize this into a novel optimization problem (see paper for details), and implement a scheduler that only takes 30μs to schedule each block.

Does Khameleon work?

The following GIF shows the image exploration application as in the intro, but ported to Khameleon. The application now consistently responds within 30–50ms, even as state of the art prefetching schemes take upwards of 50 seconds.

Using Khameleon, the application consistently responds to user interactions within 30–50ms

A major question is how Khameleon fares against state-of-the-art prefetching techniques. Since our experiments are evaluated on pre-recorded user traces, we are able to design perfect baselines: the prefetcher uses a perfect prediction model that exactly knows the user’s future requests. When the user makes a request it can either prefetch the next 1 (Perfect-1) or the next 5 (Perfect-5) requests. We make sure to never prefetch more than the network has capacity for.

The left chart reports the average response time for the image application on the AT&T LTE network. Perfect-1 takes on average over 50sec to respond, as compared to 30ms for Khameleon. This is because Perfect-1 does not guard against bursty requests, nor can it scale response quality down in the face of potential congestion.

Well sure, Khameleon responds faster, but maybe the quality is garbage. The right chart reports, on average, how quickly each method converges to full quality. Since the baselines are all or nothing, less than 30% of requests should expect to see an image even after waiting 10 seconds. In contrast, Khameleon converges within 1s — this was super exciting to see!

Dynamically reducing quality ultimately improves responsiveness and IMPROVES response quality

Our paper does a deep dive to understand when and why Khameleon works well, and studies different network conditions, other baselines, and conducts an ablation study. It also reports our results with porting Falcon, the excellent state-of-the-art custom prefetching system for cross-filter visualizations. Many of its prefetching policies, such as how it performs prediction, is hardcoded into the codebase. Long story short, it took less than 100 LOC to port to Khameleon, but doing so let us easily replace their predictor with one that further reduces response times by 10x.

Looking Forward

We are excited by these results, and look forward to the many extensions to this work. These include: extending the scheduler to support multiple clients and across multiple servers, devising novel encoding schemes for different applications, and releasing the framework to be used in interactive applications!

--

--