Can Machines Learn Faster?

Hasan Kemik
CARBON CONSULTING
Published in
3 min readDec 6, 2020

In this article, we’ll talk about LSH, hash tables and SLIDE.

SLIDE: Sub-LInear Deep learning Engine

In my previous article, we’ve already talked about LSH and hashing functions. To simply wrap up:

  • LSH was used to find the similarity between documents.
  • Hashing functions is used as helper functions in to represent most similar smallest parts of the documents.

As we are all aware, data provided by the individuals are growing day by day. According to the IDC DataAge 2025 whitepaper by Seagate, 175 ZB’s of data will be produced in 2025.

Also, with the argues about the correctness of Moore’s Law, computational power may not be enough or the cost of the products can be over the bugdets.

Trying to produce smarter algorithms can help to solve that issue, but how?

Chen & et. al. in the paper ‘SLIDE : IN DEFENSE OF SMART ALGORITHMS OVER HARDWARE ACCELERATION FOR LARGE-SCALE DEEP LEARNING SYSTEMS‘ states a new way of using LSH to solve the training cost problems of deep learning algorithms.

Chen & et. al. states that they have successfully outperformed a Tesla V100 GPU with a 44 Core CPU where the GPU takes 3.5 hours and the CPU with SLIDE algorithm takes 1 hour to train on the same dataset with the same algorithm.

SLIDE is basically referring to choosing the activated neurons when an input is given, and performing calculations on those neurons to use computational power for the meaningful calculations.

The process of choosing the activated neurons is actually choosing the most probable activated neurons.

As in the document similarity, LSH is used to find the most probable neurons in the network using hash tables. Instead of text shingles, inputs of the networks will be used. This method is based on the weights of the layers, hence it cannot be used for all the layers. For every single layer in the network, a new LSH table needs to be prepared.

After successfully selecting the neurons, forward and backward passes are completed. To demonstrate what we’ve talked let’s see it with an example network.

Figure 1: An example of a Fully Connected Neural Network with RELU Activation. (Layer to layer connections are simplified using a bigger arrow in order to reduce the complexiness of the figure.)

As can be seen in the Figure 1, when the input 5 is feeded through the network,

In the first layer:

  • Neurouns 1 and 2 will be activated.
  • Neuron 3 will not be activated, because the layer has RELU activation function where the negative values will be changed to 0. So the computation is not needed.

In the second layer:

  • Only 2nd neuron will be activated based on the same reason as the first layer’s 3rd neuron.

In the last layer:

  • Neuron 1 will be activated.

With a little intuition and knowledge, we can say that which neurons is related to the output and which are not. After the selection process is completed, we can now compute the output with 5 steps instead of 11 steps.

As you can see, we’ve reduced our computation need by more than %50 even with this simple observation.

When we turn back to our question, ‘Can Machines Learn Faster’ the answer is obviously YES.

Acknowledgments

I want to thank Mr. Meysam Asgari-Chenaghlu for reviewing and helping the creation of this article.

--

--