Why Randomness should be Embraced and Not Feared

Published in

Intuition Machine

7 min readFeb 14, 2019

Previously I wrote about how the design strategy of biology differs from the design strategy of human technology. The difference can be found in the difference in priorities. Human technology, specifically computer technology, is based on ensuring strict and correct behavior. In fact, even quantum computing is designed on the principle of taming the noise in the quantum realm to build predictable quantum circuitry. In contrast, biological systems do not require strict correctness, but rather are designed to favor robustness.

What are the design strategies for systems that favor robustness? One strategy that immediately comes into mind is redundancy. Biologically systems are multiplicatively redundant. Another strategy is the importance of diversity. Monocultures are more susceptible to extinction. What about the most unreasonably effective strategy of them all? What about randomness? How does randomness as a strategy lead to robustness?

We can take inspiration from many new human inventions that employ randomness as a strategy.

Information dispersal methods. Many are familiar with how redundant disk arrays are made more robust through redundancy. There is another kind of storage redundancy that makes randomization as its strategy. Michael Rabin introduced the idea of information dispersal as a means of introducing security, load balancing, and fault tolerance. Basically, you slide up your data into multiple parts and then randomly disperse data across a network of storage devices. The method is mathematically provable to be time-efficient and highly fault-tolerant.

Spread Spectrum methods. Spread spectrum methods are a communication method that spreads communication signals across multiple frequencies. The original reason was to ensure secure communication, however, the additional benefit of this strategy is increased resistance to interference and to limit power flux density. The method uses randomization to spread narrowband signals across a wider band of signals. This method resists jamming as well as hides the fact that communication actually took place. The latter side-effect is, in fact, problematic for methods that seek understanding through the sampling of signals.

Warehouse logistics. Amazon has a system that employs a “random stow” on how products are placed in its warehouses. Details of this system can be found in “Behind Amazon’s Well-Oiled Machine”. The system places products across the warehouse based on forecasted order frequency. This has the consequence of having diversity in all the stow areas and this has the effect of reducing bottlenecks. It is not uncommon to find unrelated items stored in the same location This random stow also reduces selection mistakes. The system does not assign similar items next to each other. Therefore, the chances of picking up a similar but incorrect item from the same location are greatly reduced. Contrast this to how a normal library organizes and stores its books. Organization is only important for systems that have limited memories (i.e. our brains).

Scalable Distributed Consensus. Achieving consensus in distributed systems is an expensive and time-consuming operation. Blockchains are an example of this problem. Bitcoin is able to ensure that money is not doubly spent through the use of a decentralized consensus mechanism. Unfortunately, this is a time consuming as well as an energy consuming process. A typical Bitcoin transaction can take over half an hour to confirm (i.e. assume 3 blocks). A recent innovation from Dfinity proposed to remove this consensus problem by employing cryptographically verifiable randomization.

Random Forests. By randomly splitting up training data across many decision tree learning processes one creates a more robust solution compared to a single decision tree with all the data. Random forests are known to be more robust againsts errors and noise. Furthermore, there is less overfitting with training data.

Decision Support. A analogous idea to Random Forests can be applied to human decision making. Daniel Kahneman employs randomization to improve human decision making. An HBR article “Noise: How to Overcome the High, Hidden Cost of Inconsistent Decision Making” describes this system. The key observation here is that if you ask people to make multiple decisions, they will inevitably lead to different answers. This is because people’s attention is different at different times. People also have different methods for making decisions. This kind of variability in decisions exists even if the person is experienced. This variability in decision making helps us better appreciate the utility of the Wisdom of the Crowds. To form a wise crowd, the following criteria are essential: diversity of opinion, independence, and decentralized knowledge. Ultimately, the greater the diversity of knowledge the more robust he aggregate decision will be. This example highlights the value of disparate viewpoints and isn’t really randomness per se. However, it hints at the society of mind approach to cognition and this is likely the same general mechanism employed by Deep Learning.

All the above are macro-scale examples that show the benefits of leveraging randomness in achieving greater robustness. The key take away from all of this is that the use of randomness has its benefits. The common strategy of ensuring correctness by reducing randomness is a flawed strategy. It is, in fact, the presence of this randomness that leads to more robust solutions.

Recent papers in Deep Learning brings into focus the effectiveness of randomization. Ben Recht has written about how employing random search can be equally effective as more complex reinforcement learning strategies. He writes:

Random search with a few minor tweaks outperforms all other methods on these MuJoCo tasks and is significantly faster.

Then there’s this paper “Gradient Descent Provably Optimizes Over-parameterized Neural Networks” that to achieve linear convergence one only needs to begin with an over-parameterization network and use random initialization. This has the effect of restricting every weight to be close to its original random initialization for all its iterations. This allows the system to exploit a “strong convexity-like property” that allows linear convergence towards a global optimum.

There’s also this paper “Rethinking ImageNet Pre-training” that demonstrates that pre-training is no better than random initialization. To make it even more confusing, there is this paper “Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet” that employs a “Bag of Local Features” to achieve competitive results.

Finally, there is this recent counterintuitive result from “Analyzing and Improving Representations with the Soft Nearest Neighbor Loss” (Nicholas Frosst, Nicolas Papernot, Geoffrey Hinton) where it is shown that by increasing the entanglement of the representation of different categories in the hidden layers of a network, the network becomes more robust to adversarial methods and also leads to better generalization performance. The authors make a counter-intuitive explanation:

Surprisingly, we find that maximizing the entanglement of representations of different classes in the hidden layers is beneficial for discrimination in the final layer.

But what is entanglement? Entanglement and chaos are the mechanisms that lead to randomness. From the perspective of Kolmogorov complexity, perfect randomness is an incompressible generator of a string. A generator is less compressible if every component of the generator is causally dependent on everything else.

To conclude, one should not dismiss randomization as a defect of a system. Rather it is the key characteristic that leads to a more robust system. This may be difficult to imagine considering our educational bias and biological bias against randomness. That is, the correct system functioning is achieved only by reducing randomness. However, this approach risks throwing the baby out with the bathwater. Randomness is an intrinsic feature of these systems and thus should be leveraged appropriately.

Note: I use the world randomness here, but I actually mean diversity. ;-) That is because, there is no such thing as randomness.

Further Reading

Deep Learning: Perturbations and Diversity is All You Need

Judea Pearl in a recent tweet expresses intractable nature of generative processes:

medium.com

Algorithmic randomness - Scholarpedia

The theory of algorithmic randomness rests on the understanding of effectiveness as given by computability theory. The…

www.scholarpedia.org

No Training Required: Exploring Random Encoders for Sentence...

We explore various methods for computing sentence representations from pre-trained word embeddings without any…

openreview.net

Approximating CNNs with Bag-of-local-Features models works...

Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to…

openreview.net

Why the Brain Is So Noisy - Issue 68: Context - Nautilus

One of the core challenges of modern AI can be demonstrated with a rotating yellow school bus. When viewed head-on on a…

nautil.us

Simple random search provides a competitive approach to reinforcement learning

A common belief in model-free reinforcement learning is that methods based on random search in the parameter space of…

arxiv.org

Randomized Prior Functions for Deep Reinforcement Learning

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on…

arxiv.org

[1811.02071v1] Scale-free Networks Well Done

Abstract: We bring rigor to the vibrant activity of detecting power laws in empirical degree distributions in…

arxiv.org

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

We consider the problem of transferring policies to the real world by training on a distribution of simulated…

arxiv.org

Superposition of many models into one

We present a method for storing multiple models within a single set of parameters. Models can coexist in superposition…

arxiv.org

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

We consider the problem of making machine translation more robust to character-level variation at the source side, such…

arxiv.org

Random Search and Reproducibility for Neural Architecture Search

Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed…

arxiv.org

A spectrum of routing strategies for brain networks

Author summary Brain network communication is typically approached from the perspective of the length of inferred paths…

journals.plos.org

Why Randomness should be Embraced and Not Feared

Deep Learning: Perturbations and Diversity is All You Need

Judea Pearl in a recent tweet expresses intractable nature of generative processes:

Algorithmic randomness - Scholarpedia

The theory of algorithmic randomness rests on the understanding of effectiveness as given by computability theory. The…

No Training Required: Exploring Random Encoders for Sentence...

We explore various methods for computing sentence representations from pre-trained word embeddings without any…

Approximating CNNs with Bag-of-local-Features models works...

Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to…

Why the Brain Is So Noisy - Issue 68: Context - Nautilus

One of the core challenges of modern AI can be demonstrated with a rotating yellow school bus. When viewed head-on on a…

Simple random search provides a competitive approach to reinforcement learning

A common belief in model-free reinforcement learning is that methods based on random search in the parameter space of…

Randomized Prior Functions for Deep Reinforcement Learning

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on…

[1811.02071v1] Scale-free Networks Well Done

Abstract: We bring rigor to the vibrant activity of detecting power laws in empirical degree distributions in…

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

We consider the problem of transferring policies to the real world by training on a distribution of simulated…

Superposition of many models into one

We present a method for storing multiple models within a single set of parameters. Models can coexist in superposition…

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

We consider the problem of making machine translation more robust to character-level variation at the source side, such…

Random Search and Reproducibility for Neural Architecture Search

Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed…

A spectrum of routing strategies for brain networks

Author summary Brain network communication is typically approached from the perspective of the length of inferred paths…

Written by Carlos E. Perez