Computing Like a Human II

Approximation Will Be Everywhere

In part I, we gave an introduction to Approximate Computing, and shared an overview of several applications that are adopting imprecise solutions in order to be practical with respect to performance and energy. We also described the opportunity to innovate across the full computing stack to maximize the benefits of Approximate Computing.

In this part, we describe a subset of industry and academic solutions that are leveraging the outsized returns on performance and energy by marginally softening accuracy across multiple components of the computing stack such as databases, middleware, hardware and visualization (i.e. beyond just algorithms).

Examples of Approximation in Database Querying and Visualization

Interactive queries allow analysts to expect response times ranging from sub-second to minutes. Unlike batch queries, such queries allow ad-hoc and exploratory analysis. Maintaining interactivity despite large data sets is challenging due to the need for high computational power. However, queries can stay interactive over massive data sets by trading-off accuracy for response time. This is achieved by either running queries on a reduced data set (using sampling techniques) or reducing the computational complexity of operations with alternative algorithms.

Two simple types of sampling

Some of the interesting examples of the former approach include BlinkDB[3], which adaptively builds and maintains a set of multi-dimensional samples from original data to answer a range of queries on 17 TBs of data (on a 100-node EC2 cluster) in <2 seconds (over 200x faster than state of the art), within an error of 2–10%; SnappyData[11], which uses curated stratified samples to also support similarly high volume queries without losing any perceived accuracy; and InfoBright [12], which solves for complex queries such as “group by” in databases by sampling based on the quality of intermediate results [11]. In all of these approaches, the resultant query performance and quality significantly depends on the quality of samples. As a result, decades of research in statistical sampling can now be applied towards making DBMS (DataBase Management System) queries more interactive.

The latter approach of reduced computational complexity is deployed by Oracle 12C [9], Facebook’s Presto and Druid[13] among others, and solves for approximate aggregates such as approximate ‘count distinct’ and approximate ‘percentile’. These approaches provide a scalable solution for solving aggregates, but do not guarantee an upper bound on the error for any specific input set. Thus, these approaches should be applied for applications with higher tolerance to absolute error.

Another class of solutions allow a DBMS to continuously learn and improve; i.e. the more queries issued in the past, the faster and more accurate answers in the future. Park et. al [15] presented such an approach in Intelli that exploit past queries’ approximate answers in speeding up new queries by using principled statistical methodology and demonstrate speeding up >70% benchmarked queries by up to 10X compared to existing sampling-based approximation engines. The continuous learning (and improvement) capability allows for definite error bounds, provided the underlying database query engine has limited raw errors.

Still another class of approximate querying solutions exploit the inherent limitation of a human eye. M4 from SAP [9] ensures pixel-perfect line visualizations for high-volume time series data while reducing data volumes by two orders of magnitude and latencies by one order of magnitude. Similarly, Park et al [16] propose ‘Visualization-Aware Sampling’ to achieve pre-determined visualization quality nearly 400X faster with a small subset of the entire subset.

Original 2B-tuple dataset, visualized in 71 mins (left) and a 31M-tuple sample, visualized in 3 secs (right) [16]

Examples of Approximation in Middleware

Middleware resides between front-end applications and backend platforms

Middleware is the software that resides between applications (client side software) and the backend architecture and platforms (such as a DBMS), often with an emphasis on networked computing. Middleware can apply approximation to provide better programmability, performance, scalability and security of applications at significantly reduced energy consumption. The following are examples where Middleware adopts approximation.

Verdict [12] is designed to be middleware specifically standing between application and DBMS. Verdict gets the original query from the user, transforms it and sends the new query(s) to the DBMS, and gets some raw results back. Verdict then calculates error estimates and returns these along with the approximate answers to the user. It uses the bootstrap method [13] for error estimation, enabling it to support complex queries.

Zimmer et al. [4] in MAPS defines approximate content-based information filtering protocols in a peer-to-peer publishing environment such that any user subscribes to and monitors only a subset of data nodes, and receives notifications about events from these nodes only. Experimental evaluation shows outsized returns in MAPS’ efficiency, even under diverse publishing scenarios. This is very relevant for any networking applications such as in IoT to reduce bandwidth and latency.

Implementing approximation strategies in middleware enables a generalizable system where the client application and/or backend systems can be easily replaced. At the same time, any performance gains will likely be less than those achieved by adopting approximation at a specific application layer such as DBMS (such as the methods discussed above).

Approximate Neural Networks

Neural Network Architecture

Neural Networks (NNs) are compute networks inspired by biological processes. A powerful capability of NNs is function approximation. In theory any function can be approximated by NNs, but constructing such a network is not trivial. NNs have been successful in image recognition and classification applications such as identifying faces, objects and traffic signs, powering vision in robots and self driving cars, but require high computational power for high accuracy. Practical implementation of NNs requires approximation in multiple components of the compute stack.

Approximation in both software and system enables SqueezeNet [7] to identify smaller NN architectures that offer less communication across servers during distributed training, requires less bandwidth to export a new model from the cloud and are more feasible to deploy on FPGAs and other hardware with limited memory. SqueezeNet achieves AlexNet [1]-level accuracy on ImageNet (the current open benchmark) with 50X fewer parameters and 510X smaller size. More recently, Ristretto [8], a hardware-oriented approximation of neural networks, enables energy-efficient inference by quantizing trained 32-bit floating point network to the smallest possible bit-width representation. Given a maximum error tolerance of 1%, Ristretto successfully can condense CaffeNet [2] and SqueezeNet to 8-bit, which will reduce area and power consumption of NN hardware implementation by multiple orders of magnitude.

Applying approximate approaches in application, middleware, DBMS and hardware in conjunction will increase the aggregate inexactness. However, in several cases the overall gain in efficiency is preferred, despite the compounding error. This is best exemplified by Google’s Neural Machine Translation (GNMT) described next.

Google’s Neural Machine Translation: Example of Simultaneous Approximation in Multiple Compute Components

Data from side-by-side evaluations, where human raters compare the quality of translations for a given source sentence. Scores range from 0 to 6, with 0 meaning “completely nonsense translation”, and 6 meaning “perfect translation.” [14]

GNMT[14] is an excellent practical and commercial example where multiple components of the compute stack simultaneously exploit approximation, and achieve new system-level capabilities. GNMT, as an end-to-end learning approach for automated translation, solves for translating rare words and reducing computational expense in training and in translation inference. GNMT applies algorithmic approximation in combination with NNs to naturally handle translation of rare words and improve the overall translation capability of the system. Further, GNMT employs low-precision arithmetic in system and hardware during inference to reduce translation speeds. Using a human side-by-side evaluation on a set of isolated simple sentences, GNMT reduces translation errors by an average of 60% compared to Google’s phrase-based production system (previous state-of-the-art in machine translation).

Approximation Will Alter the Computing Paradigm

Function approximation will be widely adopted

The next few years will see the adoption of inexactness in millions of exciting applications. Approximate computing solutions across multiple components of the stack are likely to be the only way we will meet the competing goals of quick results at scale and with increased energy efficiency.

In part III we will introduce adjacent branches of computing that both employ and advance Approximate Computing.