How to run Mistral 7B Model with Chat-UIπŸ’¬ on Amazon EC2

⭐️[Bonus] Run Zephyr-7B-alpha Model with Chat-UI

David Min
7 min readOct 19, 2023
Stable Diffusion AI Art (Stable Diffusion XL)
Source: https://mistral.ai/news/announcing-mistral-7b/

In our previous post, we covered how to deploy the open source Llama model with Chat-UI on an Amazon EC2 instance to create your own chatbot in your own Amazon Virtual Private Cloud (VPC).

In this post, we’ll look at running the Mistral 7B model with Chat-UI on Amazon EC2.

  • Create an Amazon EC2 Instance
  • Run Mistral 7B Instruct model in TGI container using Docker and AWQ Quantization
  • Install and run Chat-UI

⭐️[Bonus] We’ll also look at running the Zephyr 7B Alpha from HuggingFaceH4 at the end.

Let’s go! πŸš€

Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases. Mistral 7B is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. It’s released under Apache 2.0 licence, and we made it easy to deploy on any cloud.

MISTRAL 7B

Fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 8k context length.

Licence: Apache 2.0

Optimal for: low latency, text summarisation, classification, text completion, code completion

Source: https://mistral.ai/product/

Step 1 β€” Create an Amazon EC2 Instance

1-A. Create Amazon EC2 g5.xlarge instance using AWS CloudFormation

  • Region: us-east-1
  • AMI: β€œami-0ea5e9630dcf98849” β€” Deep Learning AMI GPU PyTorch 2.0.1 (Ubuntu 20.04) 20231003
  • Instance: g5.xlarge
  • EBS volume: 512 GB

πŸ‘‰ g5.xlarge (24GB GPU Memory): $1.006 On-Demand Price/hr

πŸ’ Mistral model requires flash attention v2

AWS CloudFormation Template: mistral-7b.yaml

AWSTemplateFormatVersion: '2010-09-09'
Description: EC2 Instance
Parameters:
KeyName:
Description: Name of an existing EC2 KeyPair to enable SSH access to the instance
Type: AWS::EC2::KeyPair::KeyName
ConstraintDescription: must be the name of an existing EC2 KeyPair.
Mappings:
RegionToAmiId:
us-east-1:
AMI: ami-0ea5e9630dcf98849
Resources:
SecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub ${AWS::StackName}-sg
GroupDescription: Security group for EC2 instance
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
EC2Instance:
Type: AWS::EC2::Instance
Properties:
InstanceType: g5.xlarge
ImageId: !FindInMap [RegionToAmiId, !Ref AWS::Region, AMI]
KeyName: !Ref KeyName
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 512
VolumeType: gp3
"Tags" : [
{"Key" : "Name", "Value" : "mistral-7b-instance"},
]
SecurityGroups:
- Ref: SecurityGroup
Outputs:
PublicDNS:
Description: Public DNSName of the newly created EC2 instance
Value: !GetAtt [EC2Instance, PublicDnsName]
PublicIP:
Description: Public IP address of the newly created EC2 instance
Value: !GetAtt [EC2Instance, PublicIp]

AWS CloudFormation > Create stack

AWS CloudFormation

AWS CloudFormation β€” Step 1 Create stack

Upload your template file and Next.

AWS CloudFormation β€” Step 1 Create stack

AWS CloudFormation β€” Step 2 Specify stack details

Specify Stack name and KeyName and Next.

AWS CloudFormation β€” Step 2 Specify stack details

AWS CloudFormation β€” Step 3 Configure stack options

Use default settings and Next.

AWS CloudFormation β€” Step 3 Configure stack options

AWS CloudFormation β€” Step 4 Review and Submit.

1-B. SSH to Amazon EC2 instance

# Terminal 1: SSH to Amazon EC2 instance
ssh -i "us-east-1-key.pem" ubuntu@ec2-###-##-##-###.compute-1.amazonaws.com

# Activate pre-built pytorch environment
source activate pytorch

# Launch Jupyter Lab
jupyter lab

1-C. SSH port forwarding to access Jupyter Lab and Chat-UI

# Terminal 2: SSH local port forwarding to Jupyter Lab and Gradio UI
ssh -i "us-east-1-key.pem" -N -L 8888:localhost:8888 -L 5173:localhost:5173 ubuntu@ec2-###-##-##-###.compute-1.amazonaws.com

1-D. Copy and paste Jupyter Server URL in your local browser

Jupyter Lab running on Amazon EC2 instance

Step 2 β€” Run Mistral 7B Instruct model in TGI container using Docker and AWQ Quantization

We will use Docker to run TGI container with AWQ quantization. πŸ‘‡

Text Generation Inference (TGI) β€” The easiest way of getting started is using the official Docker container.

Source: https://huggingface.co/docs/text-generation-inference/quicktour

Running Mistral 7B Instruct model using Docker

model=TheBloke/Mistral-7B-Instruct-v0.1-AWQ
volume=$PWD/data

docker run --gpus all \
--shm-size 1g \
-p 8080:80 \
-v $volume:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id $model \
--quantize awq \
--max-input-length 8191 \
--max-total-tokens 8192 \
--max-batch-prefill-tokens 8191

πŸ’ AWQ (Activation-aware Weight Quantization)

Step 3 β€” Install and run Chat-UI

Chat UI β€” Open source codebase powering the HuggingChat app

Github repo: https://github.com/huggingface/chat-ui

https://github.com/huggingface/chat-ui

3-A. Chat-UI Installation

# Clone the repo
git clone https://github.com/huggingface/chat-ui

# Start a Mongo Database
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest

# install nvm & npm
wget https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh
bash install.sh

# Close and reopen your terminal to start using nvm or run the following to use it now:
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion

# nvm install node
nvm install node
npm --version
10.2.0

# npm install
cd chat-ui
npm install

πŸ‘‰ You can use MongoDB Atlas free version as well.

3-B. Customize Chat-UI setting

Create .env.local file with the the example setting here. πŸ‘‡

# .env.local: custom setting
MONGODB_URL=mongodb://localhost:27017/
PUBLIC_APP_NAME="Mistral 7B Instruct Chat UI πŸ’¬"
PUBLIC_APP_ASSETS=chatui
PUBLIC_APP_COLOR=yellow

MODELS=`[
{
"name": "mistralai/Mistral-7B-Instruct-v0.1",
"displayName": "mistralai/Mistral-7B-Instruct-v0.1",
"description": "Mistral 7B is a new Apache 2.0 model, released by Mistral AI that outperforms Llama2 13B in benchmarks.",
"websiteUrl": "https://mistral.ai/news/announcing-mistral-7b/",
"preprompt": "",
"chatPromptTemplate" : "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 2048,
"stop": ["</s>"]
},
"promptExamples": [
{
"title": "Write an email from bullet list",
"prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
}, {
"title": "Code a snake game",
"prompt": "Code a basic snake game in python, give explanations for each step."
}, {
"title": "Assist in a task",
"prompt": "How do I make a delicious lemon cheesecake?"
}
],
"endpoints": [{
"url": "http://127.0.0.1:8080"
}]
}
]`

3-C. Run Chat-UI

# run Chat-UI
npm run dev
# Open local URL in your web browser
http://localhost:5173/
Mistral 7B Instruct Chat UI
Mistral 7B Instruct Chat UI (Dark Theme)

Chat-UI WebSearch 2.0

πŸ†• Websearch 2.0, now with RAG & sources πŸ‘

You can enable the web search by adding either SERPER_API_KEY (serper.dev) or SERPAPI_KEY (serpapi.com) to your .env.local.

Source: https://github.com/huggingface/chat-ui

SERPER_API_KEY="YOUR_API_KEY_HERE"
or
SERPAPI_KEY="YOUR_API_KEY_HERE"

⭐️[Bonus] Run Zephyr-7B-alpha Model with Chat-UI

Source: https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha

Zephyr 7B Alpha Model

We will use Docker to run TGI container with AWQ quantization. πŸ‘‡

model=TheBloke/zephyr-7B-alpha-AWQ
volume=$PWD/data

docker run --gpus all \
--shm-size 1g \
-p 8080:80 \
-v $volume:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id $model \
--quantize awq \
--max-input-length 8191 \
--max-total-tokens 8192 \
--max-batch-prefill-tokens 8191

Create .env.local file with the the example setting here. πŸ‘‡

# .env.local: custom setting
MONGODB_URL=mongodb://localhost:27017/
PUBLIC_APP_NAME="Zephyr 7B Alpha Chat UI πŸ’¬"
PUBLIC_APP_ASSETS=chatui
PUBLIC_APP_COLOR=blue

MODELS=`[
{
"name": "HuggingFaceH4/zephyr-7b-alpha",
"displayName": "HuggingFaceH4/zephyr-7b-alpha",
"description": "Zephyr 7B Alpha",
"websiteUrl": "https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha",
"preprompt": "<|system|>You are a friendly chatbot.</s>",
"chatPromptTemplate" : "<s>{{#each messages}}{{#ifUser}}<|user|> {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} </s>{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 2048,
"stop": ["</s>"]
},
"promptExamples": [
{
"title": "Fibonacci in Python",
"prompt": "Write a python function to calculate the nth fibonacci number."
}, {
"title": "JavaScript promises",
"prompt": "How can I wait for multiple JavaScript promises to fulfill before doing something with their values?"
}, {
"title": "Rust filesystem",
"prompt": "How can I load a file from disk in Rust?"
}
],
"endpoints": [{
"url": "http://127.0.0.1:8080"
}]
}
]`
Zephyr 7B Alpha Chat UI
Zephyr 7B Alpha Chat UI (Dark Theme)

Useful Links

--

--