Before everything switched online, we, at LightOn, were regularly organizing meetups about all things random matrices. Even though we could not offer coffee, croissant, and pizza, we decided to host an online meetup.
Sketching is a standard dimensionality reduction technique that consists in hitting the data matrix with a sketching matrix S to produce a compressed representation. The compressed representation can then be used for a number of inference needs.
But, what happens if we don’t have access to the full data matrix? For instance, we may want to compress data locally on each one of the billion edge devices and then aggregate them on the cloud. It’s not just a large scale e-commerce issue, the same issue can also be found in a Big Science setting. The Square Kilometer Array, for instance, will produce raw data at a whopping 157 TB/s. Collecting all the data in one place seems quite unpractical: assuming you store this data on DVD disks, without a keep case, after one hour and a half you end up with an Olympic-size pool collection of DVDs!
To remedy this issue, can we build a sketching matrix such where we only need a subset of the data at a time? Rakshith’s answer is yes! It can be done by applying a block diagonal sketching matrix.
While this seems like a logical solution, do we pay extra? Apparently not!
For an appropriate choice of block size, block-diagonal matrices can be as effective as a dense Gaussian matrix.
Note that blocks can be of different sizes. In fact, naively taking each block dimension to be M divided by the number of blocks is not optimal, because it does not take advantage of the low-dimensional structure of the data matrix.
Low coherence values mean that each block has more or less the same importance, so each data block can be projected roughly to the same dimension.
A high dynamic range of coherence values instead points to a smarter strategy of drawing a number of random projections for each block proportional to the corresponding coherence.
Interestingly, the bounds derived for such a choice of sketching dimensions match state-of-the-art bounds for non-blocked sketching, nicely closing the circle!
At the end of the presentation, Rakshith pointed to two other papers of his that you can check out, on Decentralized Sketching of Low-Rank Matrices and Graph Sketching for Graph Neural Networks. If the subject sparked your curiosity about sketching techniques, this survey by David Woodruff can keep you entertained for a while. Finally, at LightOn we compute dense random projections literally at the speed of light. If you want to try out these algorithms or a new idea you had, you can register to the LightOn Cloud or apply to the LightOn Cloud for Research Program!
Localized sketching for matrix multiplication and ridge regression by Rakshith S Srinivasa, Mark A Davenport, Justin Romberg, ArXiV 2003.09097
LightOn is a hardware company that develops new optical processors that considerably speed up Machine Learning computation. LightOn’s processors open new horizons in computing and engineering fields that are facing computational limits. Interested in speeding your computations up? Try out our solution on LightOn Cloud! 🌈
We would like to thank Victoire Louis for undertaking most of the effort in the organization of the meetup.