ML Day #6 — Reading The unreasonable effectiveness of RNNs:

Interesting Quote:

The concept of attention is the most interesting recent architectural innovation in neural networks.

Thoughts:

Makes me wonder about what other models of computation are possible. It feels like neural nets were developed via brain-metaphor (c.f. “neural” in neural nets), but that that metaphor was too limited. For one thing, the brain is not a static computer, but rather a dynamic system. For another, it stores information, it has loops, etc. The topologies of neural nets as we have created them probably don’t resemble too closely the topologies of information flow in the physical brain.

It makes me wonder why it took so long for this revolution in people’s thinking to happen. Perhaps they had to work through all the boring details of NNs, backpropagation, etc. and seeing what those very simple models could do before “experimenting” with other models. Perhaps it’s also that we’ve reached some sort of crucial point in terms of computational capacity/speed so that you can just spin up a docker instance for each experiment.


An experiment I might want to try is one in which an RNN is produced which is constantly feeding some input into itself (say, a 500x500 pixel image), which we are also modifying, but which it is also modifying. At any time we can click a button to indicate “good” or “bad,” to respectively reinforce/deinforce the current state-vector relationships.


I want to play around with policy networks and RNNs as control structures. I’m sure people have already started work on this, but nevertheless it’s good to try! I wonder if there’s any way to add additional layers/structure to the RNN as per its own requirement. Also, any way to apply ideas from genetic algorithms, here? Too many ideas!


Random idea: “neurons” which are actually secretly multiple neurons in a bundle. Recursively-defined structures that are arbitrarily large. E.g. if a neuron is firing a lot then you could turn it into one of these bundles so that it can learn more heterogeneous behavior!?!?

A deeper reinforcement of this idea comes from the fact that “neurons” are really somewhat like abstract generalizations of transistors. A state neuron which is taking some set of inputs and creates some set of outputs but which does this in a very absolute way (producing values either very close to zero or very close to 1) probably needs more “computing power” to produce the variety of outputs needed. The point being that, in the overall computation structure there can be “hot nodes” which are overloaded, which can’t offload their computational tasks to other nodes due to the overall topology of the network. These “hot nodes” might need to be “cooled.”



Keywords for further study:

Tensors, Torch7