Lottery Tickets and Simplified Parameters
After reading The Lottery Ticket Hypothesis, I had a few ideas and wanted to capture them to revisit as a possible research direction.
To recap, the lottery ticket hypothesis says in some randomly initialized neural networks, there exists a sparse subnetwork that can reach the same accuracy as the entire network. Such a network has won the initialization lottery.
These subnetworks are found by pruning the neural network. Benefits of these pruned networks include reduced parameter counts leading to decreased storage size, energy consumption, and inference time. You can’t train a pruned network from scratch, the only way to get a pruned network is by retraining from the same initialization state. (If you randomly reinitialize the starting weights, they no longer have their pruned performance benefits).
In “Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask” we find that pruning differently (using different masks) while maintaining the same initial state leads to subnetworks with the same performance increase. This seems to imply that the subnetwork is embedded in the initial state.
After reading this, one key question came to mind: Is there a type of function/method that can be used to initialize a network such that the resulting network will consistently have a lottery ticket subnetwork? Can this function/method be used to convey a network instead of passing around parameters? What form do this function and its parameters take? What is the most concise way to represent this?
Considering this, is there a way to transfer a model via a function instead of parameters?
I’m thinking of an analog to DNA. Sea turtles, upon hatching, have this instinct to go to the sea, but that’s really due to the setup of their neural network at initiation (via DNA). There are genes that code for this setup. There is nothing objectively good about that setup for survival generally, just that particular setup. If a sea turtle hatched in the desert, that initial neural network wouldn’t help (among other things). Similarly, a duckling’s instinct to imprint (forming an attachment to one of the first moving things the see after birth, usually their mother) is encoded in their genes and maps to some neurological structure in their brain. To me, the lottery ticket hypothesis is analogous to instinct in young animals with a newly initialized neural network.
Is there some way to have a lottery ticket network saved in a similarly concise way to DNA? Where there are initializations that are good for certain tasks like image classification or speech recognition? I have many questions, but I’m pretty new to this and might just be missing something very basic 🙂