François Pheindata from the trenchesDemystifying Multimodal LLMUnlocking the Power of Fusion in Language and Vision10 min read·Mar 21, 2024--1--1
François Pheindata from the trenchesPaying Attention to Text and Images for Visual Question AnsweringAttention first originated in translation systems as a way to focus on parts of the input sentence, when generating words of the translated…9 min read·Dec 15, 2022----