Benchmarking ChatGPT vs Claude vs Mistral

Gaetan Lion
5 min readMar 16, 2024

--

Introduction

Here I am focused solely on the three chatbot AIs’ coding assistance capabilities. I just learned about Claude and Mistral within The Economist issue of March 9th — 15th, 2024 (see page 60).

Claude is part of Anthropic, an AI start up funded by Google and Amazon. The Economist reported that Claude outperformed GPT-4 on several tasks including the Law Bar exam.

Mistral is a new entrant, using a smaller but strong performing model structure. It was developed by a French team.

Given the above, Claude and Mistral are ChatGPT competitors worth a look. I figured I would give them a hands-on test run by challenging them to generate replicable codes in answering my coding challenge.

The coding challenge

Using R ggplot2, how to generate a density plot with three variables. One has an approximate Normal distribution. One has a Skewness to the left. And, the other has Skewness to the right. And, these variables’ Skewness are symmetric around the Normally distributed variables.

ChatGPT

# Load required libraries
library(ggplot2)

# Set seed for reproducibility
set.seed(123)

# Generate data
n <- 1000

# Normal distribution
normal <- rnorm(n)

# Left skewed distribution
left_skewed <- rchisq(n, df = 3) - 3

# Right skewed distribution
right_skewed <- 3 - rchisq(n, df = 3)

# Combine data into a data frame
data <- data.frame(normal = normal, left_skewed = left_skewed, right_skewed = right_skewed)

# Plot density
ggplot(data, aes(x = value, color = variable, fill = variable)) +
geom_density(alpha = 0.5) +
theme_minimal() +
labs(title = "Density Plot of Three Variables",
x = "Value",
y = "Density",
color = "Variable",
fill = "Variable") +
scale_color_manual(values = c("blue", "red", "green")) +
scale_fill_manual(values = c("blue", "red", "green"))

The codes above generated an error. ChatGPT gave the following answer to this error:

“Apologies for the oversight. It seems I made a mistake in the aesthetics mapping. Let me correct that for you (and it generated the following codes below):”

# Load required libraries
library(ggplot2)

# Set seed for reproducibility
set.seed(123)

# Generate data
n <- 1000

# Normal distribution
normal <- rnorm(n)

# Left skewed distribution
left_skewed <- rchisq(n, df = 3) - 3

# Right skewed distribution
right_skewed <- 3 - rchisq(n, df = 3)

# Combine data into a data frame
data <- data.frame(normal = normal, left_skewed = left_skewed, right_skewed = right_skewed)

# Reshape data for ggplot2
data <- reshape2::melt(data)

# Plot density
ggplot(data, aes(x = value, color = variable, fill = variable)) +
geom_density(alpha = 0.5) +
theme_minimal() +
labs(title = "Density Plot of Three Variables",
x = "Value",
y = "Density",
color = "Variable",
fill = "Variable") +
scale_color_manual(values = c("blue", "red", "green")) +
scale_fill_manual(values = c("blue", "red", "green"))

What is interesting is that R’s error has nothing to do with the aesthetic mapping of the graph (as the error statement suggested). It had to do with forgetting to transform the data frame into a data table format. Often R error statements are downright misleading (probably true of Python and any other program too).

Nevertheless, the second time around the R codes generated by ChatGPT worked perfectly well. And, it generated this nice density plot of the three variables as specified.

Claude

Claude generated the following codes. And, they worked the first time around.

# Install and load the required packages
install.packages("ggplot2")
library(ggplot2)

# Generate sample data
set.seed(123) # for reproducibility

# Normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)

# Left-skewed distribution
left_skewed_data <- rnorm(1000, mean = -2, sd = 1.5)

# Right-skewed distribution
right_skewed_data <- rnorm(1000, mean = 2, sd = 1.5)

# Combine the data into a single data frame
data <- data.frame(
value = c(normal_data, left_skewed_data, right_skewed_data),
distribution = rep(c("Normal", "Left-skewed", "Right-skewed"),
times = c(length(normal_data), length(left_skewed_data), length(right_skewed_data)))
)

# Plot the density curves
ggplot(data, aes(x = value, color = distribution, fill = distribution)) +
geom_density(alpha = 0.3) +
labs(title = "Density Plot of Three Distributions",
x = "Value",
y = "Density",
color = "Distribution",
fill = "Distribution") +
theme_minimal()

Claude’s codes generated the following density plot.

Mistral

Mistral’s codes also worked the first time around.

# Load the ggplot2 library
library(ggplot2)

# Set the seed for reproducibility
set.seed(123)

# Create a data frame
df <- data.frame(
Normal = rnorm(1000),
Left_Skew = rgamma(1000, shape = 2, scale = 1),
Right_Skew = rgamma(1000, shape = 0.5, scale = 1)
)

# Melt the data frame into long format
df_long <- reshape2::melt(df, id.vars = NULL)

# Create the density plot
ggplot(df_long, aes(x = value, fill = variable, color = variable)) +
geom_density(alpha = 0.5) +
theme_minimal() +
labs(
title = "Density plot of Normal, Left Skewed, and Right Skewed data",
x = "Value",
y = "Density",
fill = "Variable",
color = "Variable"
) +
theme(legend.position = "top")

Mistral’s codes generated the following density plot.

Conclusion

In response to a fairly difficult coding challenge, the three AIs responded well. Based on this stand-alone experiment, I sense that ChatGPT does not own this field. The race is on. Others using much more extensive code testing may derive other conclusions. However, I would advance that at the individual or retail level, the three chatbot AIs worked well in terms of coding assistance. They did for me. And, I need all the coding assistance I can get.

It is surprising that ChatGPT was the only one among the three to commit an explicit and rather basic coding error (forgetting to transform its data frame into a data table format).

On the other hand, ChatGPT performed best at mathematical interpretation. It correctly constructed distributions skewed to the right and to the left of a Normal distribution. It did that by constructing two Chi Square distributions. And, this worked just fine.

Claude missed this math technicality. It simply constructed three normal distributions. And, two of them had a mean that were symmetric around the normal distribution at the center.

Mistral did construct skewed distributions correctly using Gamma distributions. But, these did not end up being symmetric around the Normal distribution.

Notice that Claude did something pretty innovative code wise. It transformed its data into a data table format in a completely different way. It did not use the R reshape2 library melt() function. Instead it used different function and argument ‘rep( , times())’. I wonder if even advanced R coders are familiar with this coding structure.

THE END

--

--

Gaetan Lion

I am an independent researcher conducting analysis in economics, stock markets, politics, social sciences, environment, health care, and sports.