MLearning.ai
Published in

MLearning.ai

Document Image Transformer: Introduction, Usage and Deployment

Document Image Transformer

Document Image Transformer(DiT) is a transformer that can classify the category of the document with just a picture of it.

For example, you have an image like below, feed the image to the model, and the model will tell you what kind of a document it is:

Tools

We will use huggingface and pinferencia

Pinferencia makes it super easy to serve any model with just three extra lines.
HuggingFace makes it easy to use the pre-trained model with just several lines.

Install Dependencies

HuggingFace

pip install "transformers[pytorch]"

If it doesn’t work, please visit Installation (huggingface.co) and check their official documentations.

Pinferencia

pip install "pinferencia[uvicorn]"

If it doesn’t work, please visit Install — Pinferencia (underneathall.app) and check their official documentations.

Example Usage of the Model

import base64
from io import BytesIO
from PIL import Image
from transformers import pipeline
classifier = pipeline(model="microsoft/dit-base-finetuned-rvlcdip")def classify(image_base64_str):
image = Image.open(BytesIO(base64.b64decode(image_base64_str)))
return classifier(images=image)

We can get the base64 encoded string of our image from: Image to Base64 converter to convert Image to Base64 String. (codebeautify.org)

classify("/9j/4AAQSkZJRgABAQAASABIAAD/4QB...........")

The output is:

[{'score': 0.8400426506996155, 'label': 'presentation'},
{'score': 0.043046072125434875, 'label': 'advertisement'},
{'score': 0.024246374145150185, 'label': 'questionnaire'},
{'score': 0.014194409362971783, 'label': 'form'},
{'score': 0.013648252934217453, 'label': 'news article'}]

So, it thinks our image is most likely a presentation.

Deploy the Model

Create a file app.py:

Run:

uvicorn app:service --reload

Wait for the model get downloaded. When it’s finished, you’ll see:

Call the Service

You can use curl or the interactive api page from Pinferencia.

Interactive API Page

Open your browser and visit http://127.0.0.1:8000, use the below api to predict.

The result is

Pinferencia

If you want to know more about Pinferencia, visit: underneathall/pinferencia: Python + Inference — Model Deployment library in Python. Simplest model inference server ever. (github.com)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store