Finally DeepPipe2 has started learning with CIFAR10. DeepPipe2 is a Deep-Learning framework by Elixir. It runs on GPU.
Promise with me
In 2020, I decided to make a Deep-Learning library using GPU. I made a prototype DeepPipe1 before. However, it wasn’t practical enough. It doesn’t use GPU. Previously, my lack of understanding the Elixir code was confused. However, I thought I should be able to improve better at that time. I started DeePipe2 project. I promised myself to complete the library including CNN until the end of June 2020.
February 2020, DeepPipe2 started learning DNN. Development proceeded smoothly. Because CUDA had an excellent matrix calculation library called cuBLAS. I’m also lucky that there was an excellent library that uses CBLAS called Matrex. I learned a lot from the Matrex code. I would like to thank the author, Mr. Versilov.
To implement CNN, I initially thought of using cuDNN library. However, I changed my mind to write the CNN by myself. I was not sure about the specifications of cuDNN. And I thought writing my own CNN would be my lesson. But this was a fairly difficult task.
Fight against segmentation fault
At the beginning of CNN development, I was plagued by segmentation fault. I embedded debugging code in the CUDA code to investigate the cause. The cause of the error was mostly memory allocation error. I was confused because the image data of CNN is 4D tensor. And the CUDA kernel works in parallel. This was also the cause of the bug at the beginning.
Later in the development I struggled to get the correct behavior. In some cases, the loss value did not decrease well. It was a mistake in the CNN code. CNN works on many channels. I had a hard time with deconvolution. When the stride size becomes 2 or more, the calculation of dilate is required. I have prepared a code for gradient calculation by numerical differentiation. And patiently pursued the correctness of backpropagation.
The last challenge
In May 2020, CNN of DeepPipe2 started working properly. But it did not learn well with the CIFAR10 dataset. I have improved the initial value. However, the correct answer rate did not increase as expected. I found that dropout was wrong and I improved it. However, DeepPipe2 cannot learn CIFAR10 well. I was almost disappointed. I tried the improvements for many nights. I let the computer do long-time calculations in the midnight.
June has come. DeepPipe2 could not be trained well with the CIFAR10 dataset. I asked myself, “What are you doing?” I haven’t implemented all the optimizers yet. I implemented Adam and RMSprop. And I trained CIFAR10 with Adam optimizer. I did it at last. DeepPiep2 has started learning.
Improvement of speed
Finally Deep Pipe2 was going to be well. But there was a speed issue. It was a problem of how to use the kernel in CUDA code. I was not able to utilize the power of the GPU. I worked on improving the speed, being careful not to lose the accuracy of the code. As a result, I was able to improve the speed by 4.5 times in learning CNN. It was June 9th. I was able to fulfill my promise.
I put a lot of energy into DeepPipe2. I’m a little tired. I take a break. After that, I plan to challenge RNN. I have registered DeepPipe2 on Hex. I hope it helps Elixir users.