Discover DocLLM: The New LLM From JPMorgan For Working with Complex Documents

Hanane Dupouy
5 min readJan 5, 2024
Extacted from the research paper

JPMorgan has launched a new language model called DocLLM, specifically designed for working with documents that have complex layouts. This model is a lightweight version of the traditional large language models and is focused on understanding rich documents. Unlike other models that use expensive image encoders, DocLLM uses the positions and sizes of text boxes (bounding box information) to understand how the text is arranged on a page, making it more efficient in handling documents with varied layouts.

Here is the paper link: https://arxiv.org/pdf/2401.00908.pdf. It’s a closed-source LLM.

Key Highlights

Here are the key takeaways:

💠DocLLM is a light-weight extension to standard LLMs. It’s an intrinsically multi-modal, that captures both spatial layouts and text semantics, without using the costly image encoders.

💠DocLLM comes in 2 sizes: 1B and 7B
💠DocLLM results in a performance improvement ranging from 15% to 61% for the Llama2–7B model in four out of five previously unseen datasets.
💠DocLLM-7B outperforms GPT4 and Llama2 on 12 out of 16 datasets, using zero-shot instructions.
💠DocLLM comes in 2 sizes: 1B and 7BDocLLM-7B excels (GPT-4 and Llama2) in KIE and CLS tasks, while it’s behind GPT-4 on…

--

--