… networks you can fit on a GPU or the longest we can tolerate waiting before we get a result back.” If researchers could figure out how to identify winning configurations from the get-go, it would reduce the size of neural networks by a factor of 10, even 100. The ceiling of possibility would dramatically increase, opening a new world of potential uses.
It could also change the nature of AI applications. If you can train a neural network locally on a device instead of in the cloud, you can improve the speed of the training process and the security of the data. Imagine a machine-learning-based medical device, for example, that could improve itself through use…
…tions for what tasks they’re good for. He jokingly calls this “neural network nirvana.” He believes it would dramatically accelerate and democratize AI research by lowering the cost and speed of training, and by allowing people without giant data servers to do this work directly on small laptops or even mobile phones.
Frankle imagines a future where the research community will have an open-source database of all the different configurations they’ve found, with descriptions for what tasks they’re good for. He jokingly calls this “neural network nirvana.” He believes it would dramatically accelerate and democratize AI research by lowering the cost and s…
…tter pe…tter performance than the original oversize network before any training whatsoever. In other words, the act of pruning a network to extract a winning configuration is itself an important method of training.
This also explains why observation № 1 holds true. Starting with a larger network is like buying more lottery tickets. You’re not increasing the amount of power that you’re throwing at your deep-learning problem; you’re simply increasing the likelihood that you will have a winning configuration. Once you find the winning configuration, you should be able to reuse it again and again, rather than continue to replay the lottery.
By focusing on this idea, the researchers found that if you prune an oversize network after training, you can actually reuse the resultant smaller network to train on new data and preserve high performance — as long as you reset each connection within this downsized network back to its initial strength.
The consequence is that a neural network usually starts off bigger than it needs to be. Once it’s done training, typically only a fraction of its connections remain strong, while the others end up pretty weak — so weak that you can actually delete, or “prune,” them without affecting the network’s performance.
…tion. The larger the network (the more layers and nodes it has), the less likely that is. Whereas a tiny neural network may be trainable in only one of every five initializations, a larger network may be trainable in four of every five. Again, why this happens had been a mystery, but that’s why researchers typically use very large networks for their deep-learning tasks. They want to increase their chances of achieving a successful model.