Cameron Scott White: I would imagine that “damage” in VoIP settings would be quite different than the downsampling done in this post. Damaged or corrupted audio signals are likely to suffer from non-periodic effects whereas the downsampling done here is periodic and preserves much of the information (it just cuts away the high frequencies). However, the VoIP application I had in mind would be to transmit a downsampled audio signal from a client and reconstruct a high-resolution waveform on the other end.
Arnav Vaid: Indeed, I thought to do this, too. However, their are a few practical issues regarding using the discrete FFT as a loss function, including (but not limited to): implementing this in TensorFlow would not be trivial and the window of the DFFT in the loss would essentially become another hyperparameter to optimize over. Thus, I stuck with the MSE between the raw waveforms (although, I did experiment with other losses using the waveforms).