The #paperoftheweek 5 was: Multi-Level Factorisation Net for Person Re-Identification

This paper makes for a great example for the application of the Mixture-of-Experts technique to a real-world computer vision problem. It focuses on person re-identification, which in this case is defined as “identifying people from images taken by multiple cameras without overlapping view”. The authors argue that successful person re-identification requires the whole range of semantic levels of the input image. Moving away from using just one or a few pretrained VGG layers, they apply the theory of Mixture-of-Experts to unsupervisedly let the network learn the semantic features (factor modules FM) and the strength of their influence (factor selection modules FSM) at all levels, which they then fuse together to create the final person appearance representation.


“Key to effective person re-identification (Re-ID) is modelling discriminative and view-invariant factors of person appearance at both high and low semantic levels. Recently developed deep Re-ID models either learn a holistic single semantic level feature representation and/or require laborious human annotation of these factors as attributes. We propose Multi-Level Factorisation Net (MLFN), a novel network architecture that factorises the visual appearance of a person into latent discriminative factors at multiple semantic levels without manual annotation. MLFN is composed of multiple stacked blocks. Each block contains multiple factor modules to model latent factors at a specific level, and factor selection modules that dynamically select the factor modules to interpret the content of each input image. The outputs of the factor selection modules also provide a compact latent factor descriptor that is complementary to the conventional deeply learned features. MLFN achieves state-of-the-art results on three Re-ID datasets, as well as compelling results on the general object categorisation CIFAR-100 dataset.”

For or more details and a good read, check out the paper: