A Task-Optimized Neural Network Replicates Human
Auditory Behavior, Predicts Brain Responses, and
Reveals a Cortical Processing Hierarchy — Read a Paper
Summary — Used Hierarchical Neural Networks to mimic the auditory model of human brain and reported many interesting observations. Multitasking network for both speech and music recognition is designed, and the behavioral error trends of the network is compared with human volunteers. Task specific branching after shared layers in the network architecture supports the plausible organization of the auditory block.
Experimental Setup
Two auditory-based tasks were chosen for the work, 1. Word recognition: identify the listened word and 2) Music genre prediction: identify the genre of the music played. To add robustness, a reasonable amount of noise is added to the data. Now instead of having two separate task-specific models, a single branched network with initial shared layers had branched out task specific heads.
With this trained model optimized to solve both the tasks, the exciting takeaways are,
Task independent branch points
Patterns when varying the selection of layer to branch out for both tasks showed similar behavior. Branch point at layer 3 was optimal.
Human vs Network
Human volunteers and trained network were given the same tasks of recognizing different types of words and to recognize genres of musics played with varying noise levels. In case of word detection, clean signals had more accuracy and words of musical tone were easier to recognize. For both the tasks there was a clear positive correlation between humans and network.
Voxel responses
To confirm the behavioral errors encountered in the human subjects has matching voxel responses in the attributed parts of the brain, fMRI scan was used to record the active brain regions during the task. The observations were consistent with the hypothesis and there were clear responses in the plausibly attributed parts of the brain. Noticing that the non-primary auditory cortex had little response to outliers indicate that the apparent hierarchical structure is general enough to filter input based on salience and compute activation, but not a simple neural selectivity part for speech and music.
Optimizing single architecture for both tasks with specialized branches and replicating the behavioral errors as humans showed that the model shall be treated as a quantitative model of human cortical organization, and also hints a completely different generalization ability of multitasking in deep learning models.
Hope you enjoyed reading (Espero que disfrutado) ;)