Mixtral: Generative Sparse Mixture of Experts in DataFlows

Published in

Cloudera

5 min readMar 8, 2024

“The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.”

So when I saw this come out it seemed pretty interesting and accessible. I gave it a try. With the proper prompting it seems good. I am not sure if it’s better than Google Gemma, Meta LLAMA2, or OLLAMA Mistral for my use cases.

URL

https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1

This model can be run by the lightweight serverless REST API or the transformers library. You can also use https://github.com/vllm-project/vllm. The context can have up to 32k tokens. You can also enter prompts in English, Italian, German, Spanish and French.

To Build Your Prompts Optimally, There are Some Guides

Mixtral | Prompt Engineering Guide

A Comprehensive Overview of Prompt Engineering

www.promptingguide.ai

Getting Started with Mixtral 8X7B | Pinecone

Mixtral 8X7B is Mistral AI's new state-of-the-art LLM. Using a Mixture of Experts (MoE) architecture the LLM is able to…

www.pinecone.io

So construction the prompt is very critical to make this work well. So we are building this with NiFi.

Prompt Template

{ 
"inputs": 
"<s>[INST]Write a detailed complete response that appropriately 
answers the request.[/INST]
[INST]Use this information to enhance your answer: 
${context:trim():replaceAll('"',''):replaceAll('\n', '')}[/INST] 
User: ${inputs:trim():replaceAll('"',''):replaceAll('\n', '')}</s>" 
}

Added a Filter for NSFW

So I added a call to:

michellejieli/NSFW_text_classifier · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

As part of our prompt engineering to filter out NSWF texts from Slack.

Slack Response Template

===============================================================================================================
HuggingFace ${modelinformation} Results on ${date}:

Question: ${inputs}

Answer:
${generated_text}

=========================================== Data for nerds ====

HF URL: ${invokehttp.request.url}
TXID: ${invokehttp.tx.id}

== Slack Message Meta Data ==

ID: ${messageid} Name: ${messagerealname} [${messageusername}]
Time Zone: ${messageusertz}

== HF ${modelinformation}  Meta Data ==

Compute Characters/Time/Type: ${x-compute-characters} / ${x-compute-time}/${x-compute-type}

Generated/Prompt Tokens/Time per Token: ${x-generated-tokens} / ${x-prompt-tokens} : ${x-time-per-token}

Inference Time: ${x-inference-time}  // Queue Time: ${x-queue-time}

Request ID/SHA: ${x-request-id} / ${x-sha}

Validation/Total Time: ${x-validation-time} / ${x-total-time}
===============================================================================================================

We use Pinecone for RAG.

An update from previous articles

Sometimes image processing fails so let’s pass through the original image.

Building an LLM Bot for Meetups and Conference Interactivity

Apache NiFi, LLM, GenAI, Slack Bot, Python, Vector Stores, ChatGPT, Chat

medium.com

RESOURCES

Mixtral of experts

A high quality Sparse Mixture-of-Experts.

mistral.ai

Mixture of Experts Explained

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

mistralai/Mixtral-8x7B-v0.1 · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

mistralai/Mixtral-8x7B-Instruct-v0.1 · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Mixtral

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

mistralai/Mixtral-8x7B-Instruct-v0.1 · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Invoke the Mixtral 8x7B model on Amazon Bedrock for text generation

Invoke the Mixtral 8x7B model on Amazon Bedrock for text generationdocs.aws.amazon.com

Running Mixtral 8x7b on M1 16GB

Running the Mixtral 8x7b AI model on a 16GB M1 Pro MacBook, leveraging new SOTA 2bit quantization method (QuIP).

medium.com

ikawrakow/various-2bit-sota-gguf at main

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI

How to efficiently outperform GPT-3.5 and Llama 2 70B using your computer

kaitchup.substack.com

Retro-Engineering a Database Schema: Mistral Models vs. GPT4, LLama2, and Bard (Episode 3)

How do “Mixtral 8x7b” and “Mistral Au Large”, these new LLMs developed by former developers of LLama-2, perform…

medium.com

Comparison of AI Models across Quality, Performance, Price | Artificial Analysis

Comparison and analysis of AI models across key metrics including quality, price, performance and speed (throughput…

artificialanalysis.ai

A Beginner’s Guide to Fine-Tuning Mixtral Instruct Model

Unleashing the Power of MixTRAL: A Comprehensive Guide to Fine-Tuning

generativeai.pub

RAG with ChromaDB + Llama Index + Ollama + CSV - Mervin Praison

Section 1: Section 2:

mer.vin

michellejieli/NSFW_text_classifier · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Mixtral: Generative Sparse Mixture of Experts in DataFlows

URL

To Build Your Prompts Optimally, There are Some Guides

Mixtral | Prompt Engineering Guide

A Comprehensive Overview of Prompt Engineering

Getting Started with Mixtral 8X7B | Pinecone

Mixtral 8X7B is Mistral AI's new state-of-the-art LLM. Using a Mixture of Experts (MoE) architecture the LLM is able to…

Prompt Template

Added a Filter for NSFW

michellejieli/NSFW_text_classifier · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Slack Response Template

An update from previous articles

Building an LLM Bot for Meetups and Conference Interactivity

Apache NiFi, LLM, GenAI, Slack Bot, Python, Vector Stores, ChatGPT, Chat

RESOURCES

Mixtral of experts

A high quality Sparse Mixture-of-Experts.

Mixture of Experts Explained

We're on a journey to advance and democratize artificial intelligence through open source and open science.

mistralai/Mixtral-8x7B-v0.1 · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

mistralai/Mixtral-8x7B-Instruct-v0.1 · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Mixtral

We're on a journey to advance and democratize artificial intelligence through open source and open science.

mistralai/Mixtral-8x7B-Instruct-v0.1 · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Invoke the Mixtral 8x7B model on Amazon Bedrock for text generation

Invoke the Mixtral 8x7B model on Amazon Bedrock for text generation

Running Mixtral 8x7b on M1 16GB

Running the Mixtral 8x7b AI model on a 16GB M1 Pro MacBook, leveraging new SOTA 2bit quantization method (QuIP).

ikawrakow/various-2bit-sota-gguf at main

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI

How to efficiently outperform GPT-3.5 and Llama 2 70B using your computer

Retro-Engineering a Database Schema: Mistral Models vs. GPT4, LLama2, and Bard (Episode 3)

How do “Mixtral 8x7b” and “Mistral Au Large”, these new LLMs developed by former developers of LLama-2, perform…

Comparison of AI Models across Quality, Performance, Price | Artificial Analysis

Comparison and analysis of AI models across key metrics including quality, price, performance and speed (throughput…

A Beginner’s Guide to Fine-Tuning Mixtral Instruct Model

Unleashing the Power of MixTRAL: A Comprehensive Guide to Fine-Tuning

RAG with ChromaDB + Llama Index + Ollama + CSV - Mervin Praison

Section 1: Section 2:

michellejieli/NSFW_text_classifier · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Written by Tim Spann