Can AI(s) code Deep Neural Network models? Only one truly can! Chat GPT could not

Gaetan Lion
9 min readApr 2, 2024

--

Are we getting closer to coderless coding just like we are getting closer to driverless driving?

I wanted to test this hypothesis by passing a challenging code exercise to the seven AIs available for free. The table below outlines who they are and their respective corporate investors.

Mistral is an interesting AI. It is lead by French scientists who developed a different AI algorithm: Direct Preference Optimization (DPO). The latter allows Mistral to generate competitive model results on much smaller training data. Mistral searches are deemed 3 to 6 times more efficient than otherwise.

Check below a table that focuses on the Big Tech companies and the AIs they are invested in.

Next, let’s move on to the coding challenge. I used the R language instead of Python because it is far easier to set up a DNN model in R. The DNN R coding challenge was tough enough as we shall soon see.

The coding challenge

Using the R software program can you code the following:

Create a random multivariate data set with:

  • 5 input variables named X1, X2, X3, X4, X5
  • 1 output Y variable
  • Each variable would have a roughly normal distribution
  • All variables to be standardized
  • Generate 1,000 observations
  • Use a seed (123) to render the data replicable
  • The split between the training and testing data would be 50%/50%

Create a Deep Neural Network model using the above data set

  • It has 2 hidden layers. The first hidden layer has 4 neurons. The second hidden layer has 3 neurons.
  • Use a Logistic activation function also called Sigmoid function
  • Use linear output for the output neuron
  • Algorithm backward propagation

NEXT

While still using the R language, code the following:

  • Plot the neural network without weights.
  • Do a second plot of the neural network with weights.
  • Scatterplot using the train data with Y estimates on the X-axis and Y-actual on the Y-axis.
  • Do a second Scatterplot using the test data with Y estimates on the X-axis and Y-actual on the Y-axis.

Three out of the Seven AIs failed miserably

Llama

With Llama, I can’t even get out of browser errors whether I use Firefox or Safari. And, just a couple of weeks ago, Llama worked perfectly well.

Bing

Bing generated codes with numerous errors. And, it was unable to correct them. At one point, it knowingly wrote erroneous Python codes. Also, after 5 follow up questions to fix errors, Bing would tell me that this conversation had run against its cap of follow-up questions.

Gemini

Gemini also could not generate a DNN as requested. And, Gemini generated the most spectacular “hallucination” of all. It started coding in Chinese! See Gemini’s code and explanation bellow:

As expected the Chinese 激活函数 codes generated an error!

Two out of the Seven AIs failed smartly

Chat GPT

Chat GPT generated codes that made good sense. It ran into technical errors that were pretty challenging. When I continued the chat asking it to fix the errors, it still could not fix them. But, I sense that Chat GPT was pretty close to the solution.

Perplexity

Exact same comment as for ChatGPT.

One barely made it

Claude

Like all the other AIs, it did make several errors. But, it was able to eventually fix them, and generate the DNN as requested. I am sharing its codes below.

# Set seed for reproducibility
set.seed(123)

# Generate random multivariate data
n <- 1000
X1 <- rnorm(n)
X2 <- rnorm(n)
X3 <- rnorm(n)
X4 <- rnorm(n)
X5 <- rnorm(n)
Y <- rnorm(n)

# Standardize variables
X1 <- scale(X1)
X2 <- scale(X2)
X3 <- scale(X3)
X4 <- scale(X4)
X5 <- scale(X5)
Y <- scale(Y)

# Create data frame
data <- data.frame(X1, X2, X3, X4, X5, Y)

# Split data into training and testing sets (50%/50%)
train_indices <- sample(1:nrow(data), size = nrow(data) * 0.5)
train_data <- data[train_indices, ]
test_data <- data[-train_indices, ]

# Load required libraries
library(neuralnet)

# Create a formula for the neural network
formula <- Y ~ X1 + X2 + X3 + X4 + X5

# Create a Deep Neural Network model. I had to correct by adding the stepmax function
model <- neuralnet(formula,
data = train_data,
hidden = c(4, 3),
act.fct = "logistic",
linear.output = TRUE,
learningrate = 0.02,
stepmax = 3000000)

# Print the model summary
print(model)

# Plot the neural network without weights. Add to add show.weights = FALSE
plot(model, rep = "best", radius = 0.1, edge.root = TRUE, show.weights=FALSE)

# Plot the neural network with weights
plot(model, rep = "best", radius = 0.1, edge.root = TRUE, edge.weight = TRUE)

# Scatterplot using the train data with Y estimates on the X-axis and Y-actual on the Y-axis
train_predictions <- compute(model, train_data[, -ncol(train_data)])
plot(train_predictions$net.result, train_data$Y, xlab = "Y estimates", ylab = "Y-actual", main = "Train Data")
abline(a = 0, b = 1, col = "red")

# Scatterplot using the test data with Y estimates on the X-axis and Y-actual on the Y-axis
test_predictions <- compute(model, test_data[, -ncol(test_data)])
plot(test_predictions$net.result, test_data$Y, xlab = "Y estimates", ylab = "Y-actual", main = "Test Data")
abline(a = 0, b = 1, col = "red")

This Claude R script was very slow, and the DNN did not always converge. Also, for some reason I was unable to calculate the R Square associated with the Train Data vs. Actuals and the Test Data vs. Actuals. I had to export the respective outputs, and calculate those in Excel.

If you use the above codes, you may run into trouble.

However, when it worked, the graphical output was nice as shown below.

Notice the very low R Square using the Train data, and the R Square very close to 0.00 when using the Test data. Given that all inputs were random, we are not concerned about the fit of this model. Here I was just testing Claude’s capability of just following English instructions to generate a DNN model. It passed the test, but it was not clear sailing.

The clear winner is Mistral

I don’t know if it is because of its DPO algorithm; But, the codes it generated were a lot faster and robust than Claude’s. Also, when the model was completed, I could readily code the R Square values within the R script, something I was not able to do when using Claude.

Here are Mistral’s codes:

library(neuralnet)
library(caret)

# Set seed for reproducibility
set.seed(123)

# Generate data
data_df <- as.data.frame(scale(data.frame(replicate(5, rnorm(1000)))))
names(data_df) <- c("X1_var", "X2_var", "X3_var", "X4_var", "X5_var")

# Generate output variable Y as a linear combination of inputs plus noise
data_df$Y_var <- with(data_df, 0.2*X1_var + 0.3*X2_var + 0.2*X3_var + 0.1*X4_var + 0.1*X5_var + rnorm(1000))

# Split data into training and testing sets
set.seed(123)
splitIndex <- createDataPartition(data_df$Y_var, p = .5, list = FALSE, times = 1)
train_df <- data_df[ splitIndex,]
test_df <- data_df[-splitIndex,]

# Define the neural network model
nn <- neuralnet(Y_var ~ X1_var + X2_var + X3_var + X4_var + X5_var,
data = as.matrix(train_df),
hidden = c(4,3),
act.fct = "logistic",
linear.output = TRUE,
stepmax = 1e6)

# Print the model
print(nn)

# Plot the neural network without weights. I had to correct weight= with show,weights=
plot(nn, rep="best", show.weights=FALSE)

# Plot the neural network with weights
plot(nn, rep="best", weight=TRUE)

# Predict Y values for the training data
train_pred <- compute(nn, as.matrix(train_df[, -ncol(train_df)]))$net.result

# Create scatterplot for train data
plot(train_pred, train_df$Y_var, xlab="Y Estimates", ylab="Y Actual", main="Scatterplot for Train Data")
abline(0, 1, col="red", lty=2)

# Predict Y values for the test data
test_pred <- compute(nn, as.matrix(test_df[, -ncol(test_df)]))$net.result

# Create scatterplot for test data
plot(test_pred, test_df$Y_var, xlab="Y Estimates", ylab="Y Actual", main="Scatterplot for Test Data")
abline(0, 1, col="red", lty=2)

# Calculate R Square by myself
cor(train_pred,train_df$Y_var)^2
cor(test_pred,test_df$Y_var)^2

If you rerun this set of Mistral codes, they should work fine.

The Mistral graphical output was equally as elegant as Claude’s ones.

Comparing Mistral vs. Claude

There is a huge difference in performance between Mistral and Claude.

Mistral steadily conducted 100,554 steps in only 24 seconds!

Claude took several minutes, or at times an infinite number (you had to stop it) to conduct 40,742 steps. Thus, often Claude could not actually generate the DNN model.

As shown, Mistral had far fewer lines of codes to generate the data; and, slightly fewer to generate the DNN model.

Nevertheless, I really liked Claude’s coding style. Its codes and explanations are a bit clearer than Mistral’s.

In summary, for most code challenges I run into, I probably will use Claude over Mistral. If Claude takes too long to run something, then I’ll switch to Mistral.

It is not all about coding…

The above is a specific coding test. This does not represents a ranking of the mentioned AIs on all counts. They are all good to excellent at different stuff. Just to cover a a few use cases below:

Writing an essay, translating into a different language

For such tasks, Chat GPT is great. Friends of mine have generated oustanding essays using Chat GPT on various topics such as the housing crisis or the immigration crisis. And, they asked Chat GPT to use different writing styles such as Hunter S. Thompson, or Mark Twain, etc. Of course you have to feed Chat GPT the relevant content information. But, then just watch the baffling results!

I have frequently used Chat GPT to translate my writing from English to French (the latter is my mother tongue). At first, I thought Chat GPT’s translation was a bit weird; that was until I realized that Chat GPT’s French is 10 times better than mine!

Research assistant, medical research, etc.

On those counts Perplexity is great. This is for three main reasons:

  1. It is current. It does not solely rely on pretrained data that is obsolete by definition. It also uses webcrawlers to search current information on the Internet.
  2. It references and sources everything. So, you can go to the sources to verify the validity of its assertions. Of course some sources are less reliable than others. But, it typically generates 5 to 15 sources to work with. So, you can very quickly evaluate what information components are reliable.
  3. It automatically generates related questions and answers to your original question. And, these follow-up questions are often as insightful as your original ones.

Bing is pretty good on the above counts too. But, based on my firsthand experience, Perplexity is much better. It gives more thorough answers. And, it writes a lot faster.

Additionally, Perplexity is often much better than AIs dedicated to extracting information from research papers such as Consensus AI. I compared search results between the two. There was no comparison. Consensus AI would offer a couple of explanatory sentences to explain the results of the searches. Meanwhile, Perplexity generated a 4 paragraphs long narrative like if it was written by a university professor.

Complex multi — step processes

Again, Perplexity is formidable. I have asked Perplexity pretty complex questions such as:

  1. How should a local water agency plan to shore up its water supply?
  2. How should a bank improve its credit risk management to improve its asset quality?

The answers were invariably surprisingly good.

THE END

--

--

Gaetan Lion

I am an independent researcher conducting analysis in economics, stock markets, politics, social sciences, environment, health care, and sports.