(Part 2) Getting Hands-On with MiniCPM-V: Code, Setup, and Demos!

Published in

Data Science in your pocket

4 min readAug 4, 2024

Hey there, AI adventurers! 🤖 This is the second part of our three-part series on MiniCPM-V, where we dive into the technical setup, installation, and usage of these amazing models. If you missed the first part, feel free to check it out on my profile for a high-level overview. Now, let’s get into the fun stuff!

Chat with the Demo on Gradio

Let’s start with something exciting — chatting with MiniCPM-V using Gradio. Gradio provides an easy-to-use interface for creating web-based demos. Here’s how you can set it up:

import gradio as gr
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load model and tokenizer
model_name = "OpenBMB/MiniCPM-Llama3-V-2.5"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define the chat function
def chat(input_text):
    inputs = tokenizer.encode(input_text, return_tensors="pt")
    outputs = model.generate(inputs)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Create Gradio interface
iface = gr.Interface(fn=chat, inputs="text", outputs="text", title="Chat with MiniCPM-V")
iface.launch()

Just run the above code, and voilà! You can chat with MiniCPM-V right from your browser.

Setting Up MiniCPM-V: Installation Guide

Before we can start using MiniCPM-V, we need to install it. Here’s a step-by-step guide to get you started:

1. Install Dependencies: Make sure you have Python and pip installed. Then, install the required packages:

pip install torch transformers gradio

2. Clone the Repository: Get the latest version of MiniCPM-V from GitHub:

git clone https://github.com/OpenBMB/MiniCPM-V.git
cd MiniCPM-V

3. Install the Model: Navigate to the project directory and install the model:

pip install .

Let’s Talk Inference: Making Predictions with MiniCPM-V

Once you have MiniCPM-V installed, you can start making predictions. Here’s a basic inference example:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load the model and tokenizer
model_name = "OpenBMB/MiniCPM-Llama3-V-2.5"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Define your input text
input_text = "Translate English to French: Hello, how are you?"

# Encode and generate
inputs = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(inputs)

# Decode and print the result
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Run this code to see MiniCPM-V in action, translating English to French. Pretty neat, right? 😊

Discover the Model Zoo: Exploring Available Models

MiniCPM-V comes with a variety of models for different tasks. Here’s how you can explore the model zoo:

from transformers import AutoTokenizer, AutoModel

model_names = ["OpenBMB/MiniCPM-V-1.0", "OpenBMB/MiniCPM-V-2.0", "OpenBMB/MiniCPM-Llama3-V-2.5"]

for model_name in model_names:
    model = AutoModel.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    print(f"Loaded {model_name} successfully!")

Multi-Turn Conversations: Chat Like a Pro

Multi-turn conversations make interactions more dynamic. Here’s an example to handle multi-turn dialogues:

chat_history = []

def chat(input_text):
    global chat_history
    inputs = tokenizer.encode(input_text + tokenizer.eos_token, return_tensors="pt")
    outputs = model.generate(inputs, max_length=500, pad_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(outputs[:, inputs.shape[-1]:][0], skip_special_tokens=True)
    chat_history.append(response)
    return response

print(chat("Hello! How are you today?"))
print(chat("Can you tell me a joke?"))

Speed Up with Multi-GPU Inference

Leverage multiple GPUs to speed up inference. Here’s how:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

model_name = "OpenBMB/MiniCPM-Llama3-V-2.5"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).half()
model = torch.nn.DataParallel(model)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_text = "What is the capital of France?"
inputs = tokenizer.encode(input_text, return_tensors="pt").to("cuda")
outputs = model.module.generate(inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Running Inference on a Mac

Mac users, we’ve got you covered! Here’s how to run inference on your Mac:

import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "OpenBMB/MiniCPM-Llama3-V-2.5"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_text = "Explain the theory of relativity in simple terms."
inputs = tokenizer.encode(input_text, return_tensors="pt")

# Use MPS (Metal Performance Shaders) if available
device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")
model.to(device)
inputs = inputs.to(device)

outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Deploying on Mobile Phones

Deploying MiniCPM-V on mobile phones is exciting! Here’s a basic setup using llama.cpp:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
./main -m models/MiniCPM-Llama3-V-2.5.bin -p "Translate English to Spanish: Good morning!"

Advanced Inference with vLLM

vLLM offers an advanced way to handle inference efficiently. Here’s an example:

from vllm import LLM

llm = LLM(model="OpenBMB/MiniCPM-Llama3-V-2.5")
prompt = "Write a short story about a dragon and a knight."
outputs = llm.generate(prompt, top_p=0.9)

for output in outputs:
    print(output["text"])

Ready for More?

If you’re excited about fine-tuning these models, check out the third part of this series, where we dive deep into fine-tuning MiniCPM-V with detailed code examples. 📘

Don’t forget to check out the first blog for a basic understanding of MiniCPM-V. If you have any questions or comments, feel free to drop them below. Thanks for reading! 😊

For more details about the MiniCPM-V models, visit the MiniCPM-V GitHub repository.