Michael X – Medium

Michael X

Michael X
in
Artificial Intelligence in Plain English

How To Train Multimodal LLMs To Understand And Interact With Text, Image, Video And Audio: Model…

Multimodal instruction models can be evaluated using closed-set and open-set questions, as well as qualitative assessments.

6 min readApr 26, 2024

--

How To Train Multimodal LLMs To Understand And Interact With Text, Image, Video And Audio: Model…

--

Michael X
in
Artificial Intelligence in Plain English

How To Train Multimodal LLMs To Understand And Interact With Text, Image, Video And Audio: Model…

7 min readApr 2, 2024

--

How To Train Multimodal LLMs To Understand And Interact With Text, Image, Video And Audio: Model…

--

Michael X
in
Artificial Intelligence in Plain English

How To Train Multimodal LLMs To Understand And Interact With Text, Image, Video And Audio: Model…

Current MLLMs insert visual embeddings from vision experts into pre-trained language embedding space. Key works are introduced.

9 min readMar 21, 2024

--

How To Train Multimodal LLMs To Understand And Interact With Text, Image, Video And Audio: Model…

--

Michael X
in
Artificial Intelligence in Plain English

How To Train Multimodal LLMs To Understand And Interact With Text, Image, Video And Audio

A concise introduction to the world of multimodal Large Language Models (LLMs), an overview of their background and how to train them.

7 min readMar 8, 2024

--

How To Train Multimodal LLMs To Understand And Interact With Text, Image, Video And Audio

--

Michael X

Generative AI for Innovative Photo & Video Editing & Creation — APPs and Enabling Datasets

In the ever-evolving realm of digital artistry, the emergence of Generative Artificial Intelligence (AI) has unlocked a vast array of…

7 min readDec 19, 2023

--

Generative AI for Innovative Photo & Video Editing & Creation — APPs and Enabling Datasets

--

Michael X
in
Artificial Intelligence in Plain English

Enhancing LLMs With Vision Experts (Part 3)

If you missed out on the previous articles of this series, please read:

10 min readNov 22, 2023

--

Enhancing LLMs With Vision Experts (Part 3)

--

Michael X
in
Artificial Intelligence in Plain English

Enhancing LLMs With Vision Experts (Part 2)

If you missed out on the first article of this series, please read:

7 min readNov 15, 2023

--

Enhancing LLMs With Vision Experts (Part 2)

--

Michael X
in
Artificial Intelligence in Plain English

Enhancing LLMs With Vision Experts (Part 1)

Abstract

10 min readNov 9, 2023

--

Enhancing LLMs With Vision Experts (Part 1)

--

Michael X
in
Artificial Intelligence in Plain English

Autonomous Driving Technology Revolution : From SLAM+DL to BEV+Transformer (Part 3)

If you missed out on the first article of this series, please click:

6 min readOct 25, 2023

--

Autonomous Driving Technology Revolution : From SLAM+DL to BEV+Transformer (Part 3)

--

Michael X
in
Artificial Intelligence in Plain English

Autonomous Driving Technology Revolution : From SLAM+DL to BEV+Transformer (Part 2)

If you missed out on the first article of this series, please click:

11 min readOct 18, 2023

--

Autonomous Driving Technology Revolution : From SLAM+DL to BEV+Transformer (Part 2)

--

Michael X

Michael X

Co-Founder of maadaa.ai | Data-Centric AI | Open Innovation https://www.linkedin.com/in/michael-zhang-36400a14/

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams