Actually, activations in any higher level hidden layer will give you the same effect.
Gil Keren
1
Right, but my point was that because the network is trying to reconstruct the same input, the new representation it learns is inherently more representative of the entirety of the actual input. In, say, a classification task, the new representation may be great for differentiating between two types of data, but not great at capturing some of the other differences that may be meaningless for said classification task.
I’m sorry if that wasn’t super clear — I’ll update the post tonight. Thanks!