…the inputs y_i with probability s_i. This is a rougher choice than the averaging of soft attention. Soft attention use is preferred because it can be trained with back-propagation.Memory, attention, sequences9835Eugenio Culurciello秦伟FollowSep 4, 2018 · 1 min read这段介绍了Hard attention.是根据根据概率直接选出一个.Soft attention则是进行根据概率进行加权相加,这样做的一个优势在于易于反向传播.