LightOn’s Summer Series #3 — How I Learned to Stop Worrying and Love Random Projections

LightOn
LightOn
Oct 28 · 7 min read

Welcome to LightOn’s Summer Series! For our third installment, we go on a scientific odyssey through high-dimensional spaces. Specifically, we take a closer look at random projections, the operation implemented by our first optical processing unit.

What a horrible night to have a curse

Figure 1. As dimensionality grows, samples are laying in an increasingly sparse environment. For instance, if each dimension can be discretized in 5 bins, 3 samples will cover 60% of the space in 1D, 12% in 2D, and only 2.4% in 3D. This is the cornerstone of the curse of dimensionality.

In our previous installment, we ventured into the field of optical computing and noted that high-dimensional data raise peculiar challenges. However, these difficulties are not restricted to the hardware side; algorithms are also struggling with higher dimensions. As individual samples are getting richer, the emergence of this so-called fat data becomes challenging in two ways:

  • It’s bigger: handling fatter data implies an increase in resource use (sampling time, compute, storage, and energy). Demanding applications require a parsimonious usage of such resources in our everyday life as much as when we venture into the unknown : from swarms of sensors able to foresee landslides [1], to lightning-fast MRI scans, or power-starved probes flying to the edge of our solar system — and beyond.
  • It’s stranger: with richer data comes the so-called curse of dimensionality (Figure 1). High-dimensional data is not only expensive to work with, it is also hard to handle. As dimensionality increases, data samples become harder to group and separate. Thus, insights are more difficult to obtain, because they require exponentially more observations to be statistically significant. To paraphrase Rutherford D. Rogers, one gets drowned in information but starved for knowledge.

Thus, to break the curse, researchers have sought ways to reduce the dimensionality of data — without compromising the underlying information.

Dr. Johnson-Lindenstrauss: how I learned to stop worrying and love random projections

Figure 2. A vector of dimension n is projected to dimension d by a random projection. Entries for the random projection matrix are drawn i.i.d. from a Gaussian distribution. Thanks to the Johnson-Lindenstrauss lemma, guarantees exist regarding the distorsion caused by the projection process.

Dimensionality reduction schemes such as PCA, projection pursuit, or trainable autoencoders are part of the common toolkit leveraged by data scientists. Ironically, these techniques often fail to scale to truly high-dimensional data, for they become too expensive to compute.

The Johnson-Lindenstrauss lemma [2] ensures that a set of samples in a sufficiently high-dimensional space can be projected into a lower dimension with an approximate preservation of distances — and thus with limited loss of information. Furthermore, the distributional variant of the lemma [3, 4] shows that the optimal linear transformation from a large space to a smaller one is simply… a random projection!

In other words, with no a priori knowledge on the underlying data, random projections (Figure 2) are the best linear dimensionality reduction scheme.

While it is possible to design better compression schemes with domain-specific knowledge, the development of such algorithms is a resource-intensive endeavor — standards such as JPEG took years to develop, and longer to be widely adopted, even with large communities backing them. In contrast, random projections are domain-agnostic, oblivious, and easy to use. In a wider sense, they constitute a universal compression algorithm.

Honey, I (randomly) shrunk the data

Making big data small again

Figure 3. A single-pixel camera, as enabled by compressed sensing. Repeated one-pixel sensing of a randomly modulated scene allow for its reconstruction with minimal loss.

As the distortions caused by random projections are bounded, algorithms used to study the original objects can be applied directly to the projected data with minimized loss of generality.

In large dimensions, basic data science tools such as SVD are too expensive to compute. By performing a random projection first, unreasonably large data become manageable again: randomized SVD becomes blazing fast [5], kernel machines go large-scale [6], compressed least-squares regression simply works [7], nearest neighbors delivers credible results again [8] — and much more.

At even larger scales, big data workflows involve real-time processing of gargantuan streams of ones and zeroes. For these streams, even basic tasks like counting become prohibitively expensive. Thus, practitioners rely on hashing and sketching, two common practices that can also benefit from random projections [9].

Finally, in compressed sensing [10, 11] random projections techniques are leveraged at the sensor level (Figure 3). Compressed sensing has yielded numerous insights, revolutionizing sensing modalities — proving that random projections are more than just a mathematical oddity.

Optimization with a side of randomization

Figure 4. Reservoir computing and echo-state networks can leverage random projections to create dynamical “reservoirs” capable of modelling complex phenomena. Operations related to reservoir are fixed; only the readout neurons have to be learned.

Applications of random projections go beyond sampling and embedding data. Because they enable algorithms to work on a separate representation, random projections have applications in differential privacy [12], or homomorphic encryption [13].

In machine learning, they open up new algorithmic possibilities. Chaotic systems modeling can rely on reservoir computing [14], where recurrent computations repeatedly leverage a random matrix [15, 16] (Figure 4). Random features can power anomaly detection algorithms [17] but also provide meaningful embeddings in natural language processing, and accelerate transfer learning dramatically.

Finally, as the properties of random projections are independent of the underlying data distribution, strong mathematical bounds and guarantees can be derived. This comes handy in the context of linear algebra, where the subfield of Randomized Numerical Linear Algebra (RandNLA) focuses on using random projections as new kinds of pre-conditioners. Schemes for the randomized reduction of tensors [18], linear programs [19], and semi-definite programs [20] have even been proposed.

Random projections not only enable hardware to deal efficiently with big data; they also power new class of algorithms, capable of taming high-dimensional data.

Fast & Enormous: fulfilling the promises of random projections with LightOn’s OPU

Large scale random projections can feature trillions of parameters, and thus be difficult to evaluate using existing computing capabilities. Many of the exciting prospects highlighted above have remained just that — distant prospects.

At LightOn, we have developed a first Optical Processing Unit (OPU), a co-processor capable of performing random projections at the speed of light. This OPU is not only faster than any existing hardware; it handles random projections of unprecedented sizes, thereby opening brand new computational avenues.

In the next and final installment, we will explore the underlying technology powering this first OPU, and will provide a showcase of its abilities.

Our Summer Series exploring the story behind LightOn:

Stay updated on our advancements by subscribing to our newsletter. Liked what you read and eager for more? You can check out our website, as well as our publications. Seeing is believing: you can request an invitation to LightOn Cloud, and take one of our Optical Processing Unit for a spin. Want to be part of the photonics revolution? We are hiring!

References

[1] — Seth Spielman, et al. SenSlide — A Sensor Network Based Landslide Prediction System. SenSys, 2005.

[2] — William B. Johnson, Joram Lindenstrauss. Extensions of Lipschitz Mappings into a Hilbert Space. Contemporary Mathematics 26:189–206, 1984.

[3] — Sanjoy Dasgupta, Anupam Gupta. An Elementary Proof of the Johnson-Lindenstrauss Lemma. International Computer Science Institute, Technical Report 22(1):1–5, 1999.

[4] — Dimitris Achlioptas. Database-Friendly Random Projections: Johnson-Lindenstrauss with Binary Coins. Journal of Computer and System Sciences 66(4):671–687, 2003.

[5] — Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review, 53(2), 217–288, 2010.

[6] — Ali Rahimi, Ben Recht. Random Features for Large-Scale Kernel Machines. NeurIPS, 2008.

[7] — Odalric-Ambrym Maillard, Rémi Munos. Compressed Least-Squares Regression. NeurIPS, 2009.

[8] — Piotr Indyk, Rajeev Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. ACM, 1998.

[9] — Saehoon Kim, Seungjin Choi. Bilinear Random Projections for Locality-Sensitive Binary Codes. CVPR, 2015.

[10] — Emmanuel Candes, Justin Romberg, Terence Tao. Stable Signal Recovery from Incomplete and Inaccurate Measurements. Communications on Pure and Applied Mathematics 59(8):1207–1223, 2006.

[11] — David L. Donoho. Compressed Sensing. IEEE Transactions of Information Theory 54(4):1289–1306, 2006.

[12] — Jeremiah Blocki, et al. The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy. IEEE FOCS, 2012.

[13] — Wenjun Lu, Avinash L. Varna, Min Wu. Confidentiality-Preserving Image Search: a Comparative Study Between Homomorphic Encryption and Distance-Preserving Randomization. IEEE Access, 2014.

[14] — Mantas Lukoševičius, Herbert Jaeger. Reservoir computing approaches to recurrent neural network training. Computer Science Review, 2009.

[15] — Jonathan Dong, et al. Scaling-up Echo-State Networks with Multiple Light Scattering. IEEE SSP, 2018.

[16] — Jonathan Dong, et al. Optical Reservoir Computing Using Multiple Light Scattering for Chaotic Systems Prediction. IEEE Journal of Selected Topics in Quantuum Electronics 26.1:1–12, 2019.

[17] — Nicolas Keriven, Damien Garreau, Iacopo Poli. NEWMA: a New Method for Scalable Model-Free Online Change-Point Detection. Pre-print.

[18] — Tamas Sarlos. Improved Approximation Algorithms for Large Matrices via Random Projections. IEEE FOCS, 2006.

[19] — Rachel Minster, Arvind K. Saibaba, Misha E. Kilmer. Randomized Algorithms for Low-Rank Tensor Decompositions in the Tucker Format. Pre-print, 2019.

[20] — Ky Vu, Pierre-Louis Poirion, Leo Liberti. Random Projections for Linear Programming. Mathematics of Operations Research 43(4) 1051–1071, 2018.

[21] — Andreas Bluhm, Daniel Stilck França. Dimensionality Reduction of SDPs Through Sketching. Linear Algebra and its Applications 563:461–475, 2019.

The author

Julien Launay, Machine Learning R&D engineer at LightOn AI Research.

Thanks to Laurent Daudet

LightOn

Written by

LightOn

We are a technology company developing Optical Computing for Machine Learning. Our tech harvests Computation from Nature, We are at lighton.ai

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade