Steve JonesLions, Tigers and Bears— How multimodal models get it wrongMultimodal model Melee — part 4May 282
Szymon PaluchaUnderstanding OpenAI’s CLIP modelCLIP was released by OpenAI in 2021 and has become one of the building blocks in many multimodal AI systems that have been developed since…Feb 242Feb 242
Chris PinadellaApple’s 4M Technology: The Future of AI InnovationIn an era where technology continuously reshapes the landscape of human interaction and capability, Apple has once again positioned itself…4d ago4d ago
Matthew GuntoninTowards Data ScienceMultimodal Large Language Models & Apple’s MM1This blog post will go into the architecture and findings behind Apple’s “MM1: Methods, Analysis & Insights from Multimodal LLM…Apr 13Apr 13
Steve JonesLions, Tigers and Bears— How multimodal models get it wrongMultimodal model Melee — part 4May 282
Szymon PaluchaUnderstanding OpenAI’s CLIP modelCLIP was released by OpenAI in 2021 and has become one of the building blocks in many multimodal AI systems that have been developed since…Feb 242
Chris PinadellaApple’s 4M Technology: The Future of AI InnovationIn an era where technology continuously reshapes the landscape of human interaction and capability, Apple has once again positioned itself…4d ago
Matthew GuntoninTowards Data ScienceMultimodal Large Language Models & Apple’s MM1This blog post will go into the architecture and findings behind Apple’s “MM1: Methods, Analysis & Insights from Multimodal LLM…Apr 13
Enrico RandelliniExploring the Microsoft Phi3 Vision Language model as OCR for document data extractionExamples of zero-shot OCR applications of the latest version of Microsoft Phi3 vision language model. I show how to extract the data of…Jun 115
Faire Data TeaminThe CraftAdvancing product categorization with vision language models: The power of fine-tuned LLaVAWritten by Wayne ZhangJul 17
Daniel WarfieldinTowards Data ScienceFlamingo — Intuitively and Exhaustively ExplainedThe Architecture Behind Modern Visual Language ModelingFeb 162