Back to basics

I am not an experienced researcher in deep learning, so in the post I just record my thoughts and findings through the years, making sure I am not doing the same mistakes again.

I build my first model in Theano 2 years ago. Back then I was doing Matrix Factorization algorithm and build recommendation system. The theory behind this are massive and there are many detail to cope with. Build a simple MF model from scratch that work for real-world cases is so difficult. Even with those well-known library such lib-MF I still cannot manage to produce a decent model. Those tricks in online-learning algorithm, adjust time window, dealing with concept shift, cold start problem…etc. Missing any of those details could led to bad performance for recommendation system. I was so frustrated at that time.

After the project is finished, a new course appears on the school website and caused a lot of discussion called “deep learning”. I was startled by the simplicity and performance of DNN. For recommendation system I was simply doing some text data which is human readable. In the course I was dealing with acoustic data and aimed to build a workable voice recognition model. At the end of the semester, almost every student accomplished the task.

Those complex data such as images, video, human language and speech which is hard to dealing with are become simply with the help of deep learning. We can compose some simple structure such as Convolution, Pooling layers and make some reasonable assumption and physical meaning. The network just worked. All I have to do is switching those components and parameters to tune the network for my research dataset. (Of course with the framework such as Tensorflow, keras, caffe…etc). Everything seems to be easy and everyone can do deep learning.

However, after more digging in this field, I found myself unable to figure out why the model was unstable. Is the initialization unstable or learning rate was messed up ? I can just do try and error because of the lack of fundamental knowledge. I wasn’t pay much of attention in the proof behind those network basic components such as activation function and momentum. Things are a lot complex and I should always notice the details.

What I learned from those days is that no matter how the technology makes things easy I should always back to basics. Never neglect details even though nobody pays attention and think it is simple or trivial. Recently a great paper and great post are discussing a paper called Wasserstein GAN. The analysis of Generative Adversarial Networks is very thorough. The author analyzed every detailed of the GAN network and seemed to solve the training issue. I wasn’t fully understand the analysis but I think the spirit of it (analyze every detail and fundamental theory ) this very admirable.

I think there is no easy way to accomplish great things and nobody can learn things in 3-minutes or in a single course (According to some advertisement….).