Run LLaVA v1.6 Locally

Matt Proetsch
NightShift Codes
Published in
5 min readMar 11, 2024

--

To set up your environment for running LLMs, read the first post here.

LLaVA engineers hard at work

LLaVA (Large Language and Visual Assistant) is a Large Multimodal Model (LMM) which enables both text input as well as visual inputs as seen with recent versions of ChatGPT and Google Gemini. LMMs like LLaVA allow you to upload images and chat about them. LLaVA was initially released in April 2023 and the most recent version v1.6, often referred to as LLaVA-NeXT, was released on Jan 30, 2024 by Haotian Liu et al.

Chatting (OCR + Visual Reasoning) with LLaVA v1.6

LLaVA v1.6 is a major improvement over v1.5 and the original LLaVA. As you can see in the screenshot above, LLaVA v1.6 is very capable at optical character recognition (OCR) and visual reasoning, thanks to an improved mix of those tasks in the training data and an increase in supported image resolutions.

More chatting (world knowledge) with LLaVA v1.6

The new version of LLaVA also contains more world knowledge and better visual conversation capabilities over previous versions of the model.

--

--

Matt Proetsch
NightShift Codes

Programmer, data enthusiast, co-founder of @nightshiftcodes