Heuritech
Published in

Heuritech

Mutan: Multimodal Tucker Fusion for visual question answering

Foreword: The author is Hedi Ben Younes, former PhD student at LIP6 / Heuritech. Multimodal fusion of text and image information is an important topic at Heuritech, as most of the media on the internet is composed of images, videos, and text. The challenging Visual Question Answering task is an excellent benchmark for the fusion of text and image.

This blog post presents a work done by Hédi Ben-Younes*, Rémi Cadène*, Matthieu Cord and Nicolas Thome. The paper was accepted at the International Conference on Computer Vision

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store