Quantized Mixtral: Exploring Performance and Challenges Across Diverse Prompts

Impact of Quantization on Mixtral’s Responses

Ingrid Stevens
10 min readJan 8, 2024

In this article, I examine the impact of quantization on the performance of Mixtral, a large language model, through a series of tests. I compare the performance of quantized Mixtral running on my M1 Mac via LM Studio to its unquantized counterpart (Vercel serves Mixtral via fireworks.ai), as well as to ChatGPT 3.5, on six distinct prompts covering topics such as logical reasoning, math, following instructions, and poetry.

If you want to try this yourself, use the following websites to try running Mixtral:

To ensure the validity of my results, I made several assumptions:

(1) I assumed that Vercel, the platform on which I am running Mixtral, is using an unquantized version of the model. This assumption is based on the fact that Fireworks.ai, the company providing Mixtral to vercel, offers an unquantized version of the model, and I believe it is reasonable to assume that Vercel is using this version.

(2) I assumed that Vercel’s Top K Sampling parameter, which determines the number of candidate responses generated by the model, does not differ significantly from the default value of 40. This assumption is based on the fact that the default value for this parameter in LM Studio, the local development environment where I am running Mixtral 8x7B Instruct Q3, is set to 40. Additionally, I am assuming that Vercel would not have deviated significantly from this default.

(3) I also assumed that Vercel does not use any system prompt when making requests to the Mixtral model. This allows me to test the model’s ability to generate responses independently, without any external influence.

I will now describe what quantization is, and proceed to show the results of my tests. Note, I’ll use the emoji ❌ to denote an incorrect response, and ✅ for a correct response. I’ll also create an example

What is quantization?

Think of a non-quantized model like a detailed painting with many colors. Now, imagine a quantized version as a simplified sketch using fewer distinct shades. Quantization reduces precision, making the model more lightweight but potentially sacrificing some accuracy. It’s like trading detailed colors for a quicker, simpler outline.

Quantizing a model means converting its weight values from floating-point precision (with many decimals) to lower precision, like integers or smaller floats. It’s like rounding off numbers. This reduces memory usage and speeds up computations but might lose some fine details. It’s a trade-off for efficiency. Imagine going from exact measurements to ballpark figures.

Evaluating Quality of Responses

In the following six prompts, I’ll compare the quality of the different models. Overall, I found that Mixtral running

Prompt #1:

Linda has 3 sisters
Paul has 4 sisters
Can Linda and Paul be siblings?

Mixtral-8x7B running on Vercel

Linda and Paul cannot be siblings based on the information provided. The number of sisters each of them has is about their own families and not related to each other’s families. Sibling relationships are typically established through shared parents, which is not indicated in this scenario.

Mixtral-8x7B Q3_K_M running on M1

Linda and Paul cannot be siblings because sisters are female and a sibling relationship requires at least one shared parent. Since Paul has four sisters, it implies that he is male, and Linda having three sisters indicates that she is also female. Therefore, they do not share the same parentage or gender, so they cannot be siblings.

ChatGPT 3.5

No, Linda and Paul cannot be siblings based on the information provided. Linda has 3 sisters, and Paul has 4 sisters, indicating that they have different sets of siblings. Without additional information or clarification, it can be inferred that Linda and Paul do not share the same set of parents and are not siblings.

My Answer: Yes, Linda and Paul can be siblings, as Linda and her 3 sisters can make up the 4 sisters that Paul has.

💡Observations: All three models fail to recognize that Linda and Paul could indeed be siblings, as illustrated in the drawing.

Prompt #2:

Björn is 3 years older than Freya
Freya is Björn’s sister.
Björn is twice Freya’s age.
How old is Freya? How old is Björn?

Mixtral-8x7B running on Vercel

The second statement, “Freya is Björn’s sister,” implies that they share the same parents and thus, their ages are related. However, the first and third statements contradict each other.

If Björn is 3 years older than Freya, then he cannot be twice her age. If Björn is twice Freya’s age, then Freya would be half of Björn’s age, which means Björn would be 4 years older than Freya (not 3).

Therefore, there seems to be a mistake in the information provided and we cannot determine accurate ages for Freya and Björn based on this contradictory information.

Mixtral-8x7B Q3_K_M running on M1

The second statement, “Freya is Björn’s sister,” does not provide any new information to solve the age problem. We already know from the first statement that Björn is three years older than Freya.

The third statement, “Björn is twice Freya’s age,” is inconsistent with the first statement. If Björn is twice as old as Freya, then there is a contradiction because, in that case, Björn would be older than Freya by more than three years.

Therefore, the given information is inconsistent and cannot be used to determine the ages of Björn and Freya.

ChatGPT 3.5

Let’s denote Freya’s age as F and Björn’s age as B.

According to the information given:
1. Björn is 3 years older than Freya: B = F + 3.
2. Björn is twice Freya’s age: B = 2F.

Now, we can set up a system of equations:

Equation 1: B = F + 3
Equation 2: B = 2F

Since both equations equal B, we can set them equal to each other:

F + 3 = 2F

Subtract F from both sides:

3 = F

So, Freya is 3 years old. Now, we can find Björn’s age using one of the original equations. Let’s use Equation 1:

B = F + 3
B = 3 + 3
B = 6

Therefore, Freya is 3 years old, and Björn is 6 years old.

My Answer: Freya is 3, Björn is 6.

💡Observations: Mixtral encountered challenges in comprehending this particular problem, and it appears that the inclusion of the statement “Freya is Björn’s sister.” led to confusion for the model. This assumption is supported by the fact that upon rerunning the problem without this statement, Mixtral produced the correct result.

Prompt #3:

What is bigger than an scorpion, and has exactly half as many letters?

Mixtral-8x7B running on Vercel

The word “Scorpion” has 8 letters.

The word that is bigger than a scorpion and has exactly half as many letters is “Ant.” Ant has 3 letters, which is half of the number of letters in “Scorpion.” Additionally, an ant is generally larger than a scorpion in terms of population size and cultural significance, even though a single ant may be smaller than a single scorpion.

Mixtral-8x7B Q3_K_M running on M1

The word “scorpion” has 9 letters. Half as many letters would be 9 / 2 = 4.5 letters, but we can’t have half a letter. However, if we round down to 4 letters, we can look for a word that has 4 letters and is bigger than “scorpion” in terms of size or meaning.

One such word is “elephant,” which has 4 letters and is much larger than a scorpion in size and significance. So, an elephant is bigger than a scorpion and has exactly half as many letters (when we round down).

ChatGPT 3.5

The word “spoon” is bigger than “scorpion” and has exactly half as many letters.

My Answer: “Scorpion” has 8 letters. “Boar” is bigger than a scorpion and has exactly half as many letters (4).

💡Observations: All three models failed to provide a correct response, with each model offering different words (“Ant,” “elephant,” and “spoon”) that did not fulfill the criteria outlined in the prompt, highlighting a consistent pattern of misunderstanding in relation to the specified conditions.

Prompt #4:

I’m at Point Reyes standing at the edge of an oceanside cliff. What direction should i face to see the sun hit the ocean? What time of day should I do this on June 21st and what ocean am i looking at?

Mixtral-8x7B running on Vercel (doesn’t say what time of day)

In the Northern Hemisphere, on June 21st, which is the summer solstice, the sun will be directly over the Tropic of Cancer, which is at a latitude of approximately 23.5 degrees north. Point Reyes is located at a latitude of approximately 38 degrees north.

Therefore, to see the sun hit the ocean, you should face west on June 21st at Point Reyes. This is because during summer solstice in the Northern Hemisphere, the sun sets in the west and rises furthest north of east.

The ocean that you would be looking at from Point Reyes is the Pacific Ocean.

Mixtral-8x7B Q3_K_M running on M1 (doesn’t say what time of day)

On June 21st, the summer solstice, the sun will be at its northernmost point in the sky for the Northern Hemisphere. Therefore, if you are at Point Reyes on June 21st and want to see the sun hit the ocean, you should face west. The Pacific Ocean is to the west of Point Reyes, so you will be looking out at the Pacific Ocean.

It’s worth noting that the exact direction may vary slightly depending on the time of day you choose to observe the sun hitting the ocean, as the sun’s position in the sky changes throughout the day. However, in general, facing west will allow you to see the sun setting over the Pacific Ocean.

ChatGPT 3.5

If you are at Point Reyes standing at the edge of an oceanside cliff and want to see the sun hit the ocean, you should face west. The sun sets in the west, and on June 21st, which is the summer solstice, you would want to be there during sunset.

On June 21st, the summer solstice, the sun reaches its highest point in the sky, and the day is the longest of the year in the Northern Hemisphere. Sunset on the west coast of the United States, including Point Reyes, generally occurs around 8:30 PM to 9:00 PM local time.

Facing west from Point Reyes, you would be looking at the Pacific Ocean. The sun setting over the Pacific can create a beautiful and dramatic scene, casting warm colors across the water and coastline.

My Answer: At Point Reyes, face west at 20:38 PDT to see the sun set over the Pacific ocean.

💡Observations: Both Mixtrals provided incomplete responses by omitting the specific time of day, quantized Mixtral fared worse, as it neglected that we are looking for sunset, which has implications on the time of day. ChatGPT 3.5 delivered a comprehensive and accurate answer, highlighting the significance of facing west during sunset on June 21st at Point Reyes to witness the sun hitting the Pacific Ocean.

Prompt # 5

Although these models aren’t designed to do math, I thought it was important to test nonetheless:

What is 5 to the power of 8 minus 6 to the power of 8? Double it and add two to the power of 3?

Mixtral-8x7B running on Vercel

First, let’s calculate 5 to the power of 8 minus 6 to the power of 8:

5⁸ = 390,625
6⁸ = 16,777,216
390,625–16,777,216 = -16,386,591

Next, let’s double this value:
-16,386,591 \* 2 = -32,773,182

Finally, let’s add two to the power of three:
2³ = 8
-32,773,182 + 8 = -32,773,174.

Mixtral-8x7B Q3_K_M running on M1

Let’s break down the expression step by step:

1. Calculate 5⁸ and 6⁸:

5⁸ = 390625
6⁸ = 16777216

2. Subtract 6⁸ from 5⁸:

390625–16777216 = -16386591

3. Double the result:

-16386591 \* 2 = -32773182

4. Add 2³ to the result:

2³ = 8

-32773182 + 8 = -32773174

So, 5 to the power of 8 minus 6 to the power of 8, doubled and with 2 to the power of 3 added, equals -32773174.

ChatGPT 3.5

Let’s break down the expression step by step:

1. \(5⁸ — 6⁸\)
2. Double the result: \(2 \times (5⁸ — 6⁸)\)
3. Add \(2³\) to the doubled result: \(2 \times (5⁸ — 6⁸) + 2³\)

Now, let’s calculate each part:

1. \(5⁸ — 6⁸ = 390,625–1,679,616 = -1,288,991\)
2. Double the result: \(2 \times (-1,288,991) = -2,577,982\)
3. Add \(2³\): \(-2,577,982 + 8 = -2,577,974\)

So, \(2 \times (5⁸ — 6⁸) + 2³ = -2,577,974\).

My Answer:

💡Observations: Both Mixtrals calculated 6⁸ incorrectly, which threw the entire calculation out the window. ChatGPT 3.5 successfully broke it down into steps and presented the correct answer.

Prompt #6:

Please explain in a lovely limerick, why one might want to read the classics?

Mixtral-8x7B running on Vercel

In books of old, with tales so grand,
The classics hold a timeless hand.
With prose that sings and words that dance,
They offer wisdom in a trance.
So read the classics, understand!

Mixtral-8x7B Q3_K_M running on M1

In books of old, there lies delight,
Classics shining, both day and night.
They teach us of the past,
And wisdom that will last,
A joy to read, an absolute sight!

ChatGPT 3.5

In a library so vast and grand,
Classics beckon with wisdom firsthand.
With prose that’s divine,
They stand the test of time,
In their pages, a treasure trove is planned.

My Answer:

To weigh ideas, let’s be wise,
Origins unveil truth that lies.
Memes distill, but can’t apprise,
The richness, depth, where ideas rise.
So, delve into classics, where your insight lies.

The Verdict: Quantized Mixtral is Just Fine.

Upon reviewing the responses to the prompts, it’s evident that quantization did not significantly impact the overall quality of Mixtral’s output. In fact, the quantized version performed similarly to its non-quantized counterpart across various scenarios. Quantization of Mixtral did introduce a subtle error in prompt #3, where Q3_K_M Mixtral struggled to accurately count the 8 letters in the word “Scorpion.” But otherwise, performance was roughtly equivalent: the limitations observed in Mixtral’s performance, such as the inability to handle complex mathematical computations, were consistent across both quantized and non-quantized variants.

In conclusion, in the context of these specific prompts, quantization had a limited impact on the quality of Mixtral’s responses. The model, whether quantized or not, demonstrated consistent strengths and weaknesses.

--

--