How to run Mistral 7B Model with Chat-UI💬 on Amazon EC2

⭐️[Bonus] Run Zephyr-7B-alpha Model with Chat-UI

7 min readOct 19, 2023

Stable Diffusion AI Art (Stable Diffusion XL)

Source: https://mistral.ai/news/announcing-mistral-7b/

In our previous post, we covered how to deploy the open source Llama model with Chat-UI on an Amazon EC2 instance to create your own chatbot in your own Amazon Virtual Private Cloud (VPC).

How to run Llama Model 🦙 with Chat-UI 💬 on Amazon EC2

Build your own HuggingChat experience using Open Source 🦙 💬

medium.com

In this post, we’ll look at running the Mistral 7B model with Chat-UI on Amazon EC2.

Create an Amazon EC2 Instance
Run Mistral 7B Instruct model in TGI container using Docker and AWQ Quantization
Install and run Chat-UI

⭐️[Bonus] We’ll also look at running the Zephyr 7B Alpha from HuggingFaceH4 at the end.

Let’s go! 🚀

Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases. Mistral 7B is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. It’s released under Apache 2.0 licence, and we made it easy to deploy on any cloud.
MISTRAL 7B
Fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 8k context length.
Licence: Apache 2.0
Optimal for: low latency, text summarisation, classification, text completion, code completion
Source: https://mistral.ai/product/

Step 1 — Create an Amazon EC2 Instance

1-A. Create Amazon EC2 g5.xlarge instance using AWS CloudFormation

Region: us-east-1
AMI: “ami-0ea5e9630dcf98849” — Deep Learning AMI GPU PyTorch 2.0.1 (Ubuntu 20.04) 20231003
Instance: g5.xlarge
EBS volume: 512 GB

👉 g5.xlarge (24GB GPU Memory): $1.006 On-Demand Price/hr

💁 Mistral model requires flash attention v2

AWS CloudFormation Template: mistral-7b.yaml

AWSTemplateFormatVersion: '2010-09-09'
Description: EC2 Instance
Parameters:
 KeyName:
   Description: Name of an existing EC2 KeyPair to enable SSH access to the instance
   Type: AWS::EC2::KeyPair::KeyName
   ConstraintDescription: must be the name of an existing EC2 KeyPair.
Mappings:
  RegionToAmiId:
    us-east-1:
      AMI: ami-0ea5e9630dcf98849
Resources:
  SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupName: !Sub ${AWS::StackName}-sg
      GroupDescription: Security group for EC2 instance
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
  EC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: g5.xlarge
      ImageId: !FindInMap [RegionToAmiId, !Ref AWS::Region, AMI]
      KeyName: !Ref KeyName
      BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
            VolumeSize: 512
            VolumeType: gp3
      "Tags" : [
        {"Key" : "Name", "Value" : "mistral-7b-instance"},
      ]
      SecurityGroups:
        - Ref: SecurityGroup
Outputs:
  PublicDNS:
    Description: Public DNSName of the newly created EC2 instance
    Value: !GetAtt [EC2Instance, PublicDnsName]
  PublicIP:
    Description: Public IP address of the newly created EC2 instance
    Value: !GetAtt [EC2Instance, PublicIp]

AWS CloudFormation > Create stack

AWS CloudFormation — Step 1 Create stack

Upload your template file and Next.

AWS CloudFormation — Step 2 Specify stack details

Specify Stack name and KeyName and Next.

AWS CloudFormation — Step 3 Configure stack options

Use default settings and Next.

AWS CloudFormation — Step 4 Review and Submit.

1-B. SSH to Amazon EC2 instance

# Terminal 1: SSH to Amazon EC2 instance
ssh -i "us-east-1-key.pem" ubuntu@ec2-###-##-##-###.compute-1.amazonaws.com

# Activate pre-built pytorch environment
source activate pytorch

# Launch Jupyter Lab
jupyter lab

1-C. SSH port forwarding to access Jupyter Lab and Chat-UI

# Terminal 2: SSH local port forwarding to Jupyter Lab and Gradio UI
ssh -i "us-east-1-key.pem" -N -L 8888:localhost:8888 -L 5173:localhost:5173 ubuntu@ec2-###-##-##-###.compute-1.amazonaws.com

How to connect Amazon EC2 using SSH Local Port Forwarding

How to connect Amazon EC2 using SSH Local Port Forwarding medium.com

1-D. Copy and paste Jupyter Server URL in your local browser

Jupyter Lab running on Amazon EC2 instance

Step 2 — Run Mistral 7B Instruct model in TGI container using Docker and AWQ Quantization

We will use Docker to run TGI container with AWQ quantization. 👇

Text Generation Inference (TGI) — The easiest way of getting started is using the official Docker container.
Source: https://huggingface.co/docs/text-generation-inference/quicktour

Running Mistral 7B Instruct model using Docker

model=TheBloke/Mistral-7B-Instruct-v0.1-AWQ
volume=$PWD/data

docker run --gpus all \
--shm-size 1g \
-p 8080:80 \
-v $volume:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id $model \
--quantize awq \
--max-input-length 8191 \
--max-total-tokens 8192 \
--max-batch-prefill-tokens 8191

💁 AWQ (Activation-aware Weight Quantization)

Step 3 — Install and run Chat-UI

Chat UI — Open source codebase powering the HuggingChat app
Github repo: https://github.com/huggingface/chat-ui

3-A. Chat-UI Installation

# Clone the repo
git clone https://github.com/huggingface/chat-ui

# Start a Mongo Database
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest

# install nvm & npm
wget https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh
bash install.sh

# Close and reopen your terminal to start using nvm or run the following to use it now:
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion

# nvm install node
nvm install node
npm --version
10.2.0

# npm install
cd chat-ui
npm install

👉 You can use MongoDB Atlas free version as well.

3-B. Customize Chat-UI setting

Create .env.local file with the the example setting here. 👇

# .env.local: custom setting
MONGODB_URL=mongodb://localhost:27017/
PUBLIC_APP_NAME="Mistral 7B Instruct Chat UI 💬"
PUBLIC_APP_ASSETS=chatui
PUBLIC_APP_COLOR=yellow

MODELS=`[
  {
    "name": "mistralai/Mistral-7B-Instruct-v0.1",
    "displayName": "mistralai/Mistral-7B-Instruct-v0.1",
    "description": "Mistral 7B is a new Apache 2.0 model, released by Mistral AI that outperforms Llama2 13B in benchmarks.",
    "websiteUrl": "https://mistral.ai/news/announcing-mistral-7b/",
    "preprompt": "",
    "chatPromptTemplate" : "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
    "parameters": {
        "temperature": 0.1,
        "top_p": 0.95,
        "repetition_penalty": 1.2,
        "top_k": 50,
        "truncate": 1000,
        "max_new_tokens": 2048,
        "stop": ["</s>"]
      },
    "promptExamples": [
      {
          "title": "Write an email from bullet list",
        "prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
      }, {
        "title": "Code a snake game",
        "prompt": "Code a basic snake game in python, give explanations for each step."
      }, {
        "title": "Assist in a task",
        "prompt": "How do I make a delicious lemon cheesecake?"
      }
    ],
    "endpoints": [{
      "url": "http://127.0.0.1:8080"
    }]
  }
]`

3-C. Run Chat-UI

# run Chat-UI
npm run dev

# Open local URL in your web browser
http://localhost:5173/

Mistral 7B Instruct Chat UI (Dark Theme)

Chat-UI WebSearch 2.0

🆕 Websearch 2.0, now with RAG & sources 👍

You can enable the web search by adding either SERPER_API_KEY (serper.dev) or SERPAPI_KEY (serpapi.com) to your .env.local.
Source: https://github.com/huggingface/chat-ui

SERPER_API_KEY="YOUR_API_KEY_HERE"
or 
SERPAPI_KEY="YOUR_API_KEY_HERE"

⭐️[Bonus] Run Zephyr-7B-alpha Model with Chat-UI

Source: https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha

Zephyr 7B Alpha Model

We will use Docker to run TGI container with AWQ quantization. 👇

model=TheBloke/zephyr-7B-alpha-AWQ
volume=$PWD/data

docker run --gpus all \
--shm-size 1g \
-p 8080:80 \
-v $volume:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id $model \
--quantize awq \
--max-input-length 8191 \
--max-total-tokens 8192 \
--max-batch-prefill-tokens 8191

Create .env.local file with the the example setting here. 👇

# .env.local: custom setting
MONGODB_URL=mongodb://localhost:27017/
PUBLIC_APP_NAME="Zephyr 7B Alpha Chat UI 💬"
PUBLIC_APP_ASSETS=chatui
PUBLIC_APP_COLOR=blue

MODELS=`[
  {
    "name": "HuggingFaceH4/zephyr-7b-alpha",
    "displayName": "HuggingFaceH4/zephyr-7b-alpha",
    "description": "Zephyr 7B Alpha",
    "websiteUrl": "https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha",
    "preprompt": "<|system|>You are a friendly chatbot.</s>",
    "chatPromptTemplate" : "<s>{{#each messages}}{{#ifUser}}<|user|> {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} </s>{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
    "parameters": {
        "temperature": 0.1,
        "top_p": 0.95,
        "repetition_penalty": 1.2,
        "top_k": 50,
        "truncate": 1000,
        "max_new_tokens": 2048,
        "stop": ["</s>"]
      },
    "promptExamples": [
      {
        "title": "Fibonacci in Python",
        "prompt": "Write a python function to calculate the nth fibonacci number."
      }, {
        "title": "JavaScript promises",
        "prompt": "How can I wait for multiple JavaScript promises to fulfill before doing something with their values?"
      }, {
        "title": "Rust filesystem",
        "prompt": "How can I load a file from disk in Rust?"
      }
    ],
    "endpoints": [{
      "url": "http://127.0.0.1:8080"
    }]
  }
]`

How to run Mistral 7B Model with Chat-UI💬 on Amazon EC2

⭐️[Bonus] Run Zephyr-7B-alpha Model with Chat-UI

How to run Llama Model 🦙 with Chat-UI 💬 on Amazon EC2

Build your own HuggingChat experience using Open Source 🦙 💬

Step 1 — Create an Amazon EC2 Instance

1-A. Create Amazon EC2 g5.xlarge instance using AWS CloudFormation

AWS CloudFormation Template: mistral-7b.yaml

1-B. SSH to Amazon EC2 instance

1-C. SSH port forwarding to access Jupyter Lab and Chat-UI

How to connect Amazon EC2 using SSH Local Port Forwarding

How to connect Amazon EC2 using SSH Local Port Forwarding

1-D. Copy and paste Jupyter Server URL in your local browser

Step 2 — Run Mistral 7B Instruct model in TGI container using Docker and AWQ Quantization

Running Mistral 7B Instruct model using Docker

Step 3 — Install and run Chat-UI

3-A. Chat-UI Installation

3-B. Customize Chat-UI setting

3-C. Run Chat-UI

Chat-UI WebSearch 2.0

⭐️[Bonus] Run Zephyr-7B-alpha Model with Chat-UI

Zephyr 7B Alpha Model

Useful Links

Written by David Min