Decoding Machine Learning Results with ChatGPT: A Step Towards AI Democratization!

Alexandre t'Kint
5 min readFeb 7, 2023

--

Artificial Intelligence has taken the world by storm, revolutionizing the way we live, work and communicate. However, for many people, the concept of AI remains shrouded in mystery, often perceived as too complex to understand. But what if we told you that you can now decode the complex world of machine learning and AI with just a few taps on your keyboard? In this blog we demystify the key concepts of AI and Machine Learning, making them accessible to anyone with a curious mind.

With the help of ChatGPT, a state-of-the-art language model developed by OpenAI, we will delve into the exciting world of AI, breaking down complex ideas into simple, easy-to-understand language. Join us on this journey to learn about the latest advancements in AI and understand how you can utilize these technologies to create a better future. Get ready to embrace the future of AI and walk Towards AI Democratization!

Let’s go! We will be creating a simple Random Forest Model to predict house prices in California. Then evaluate the model’s accuracy using popular qualitative metrics. To further enhance our understanding, we will be using a GPT-3 model to make the results understandable.

Table of contents:

  • Create a simple Random Forest Model which predicts the house prices in California
  • Calculate the most popular model Qualitative Metrics
  • Create a function that calls a GPT-3 model
  • Compute the results

1. Create a simple Random Forest Model which predicts the house prices in California

This code is building a machine-learning model to predict housing prices in California. We start by importing libraries such as numpy, pandas, and shap. The California Housing Prices dataset is then loaded and split into training and testing sets. The code then creates a default instance of the random forest regressor (a type of machine learning algorithm) and fits the model to the training data. Finally, the code makes predictions on the test data using the trained model.

import shap
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

# California Housing Prices
dataset = fetch_california_housing(as_frame = True)
X = dataset['data']
y = dataset['target']

# Split the dataset in train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

# Prepares a default instance of the random forest regressor
model = RandomForestRegressor()

# Fits the model on the data
model.fit(X_train, y_train)

# Make predictions on the test data
y_pred = model.predict(X_test)

2. Calculate the most popular model Qualitative Metrics

This code is evaluating the accuracy of the machine learning model built in the previous code snippet. It uses three popular metrics for regression models: Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). The code first calculates the values of these metrics by comparing the actual target values (y_test) with the predicted values (y_pred). Finally, the code creates a human-readable string for each metric with its value.

# Compute metrics
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = sqrt(mse)

# Make an easy to understand string which is rounded
MSE = f"MSE: {round(mse, 3)}"
MAE = f"MAE: {round(mae, 3)}"
RMSE = f"RMSE: {round(rmse, 3)}"

This code is using the library shap to explain the predictions made by the machine learning model. The shap.Explainer method is used to fit the explainer to the model's predictions and the input data (X_test). The explainer is then used to calculate the SHAP values, which are a measure of the contribution of each feature to the model's prediction. The absolute mean of the SHAP values is then calculated and stored in the vals variable. Finally, a table is created to display the features and their corresponding SHAP values, and the table is sorted by the Shap value in descending order.

# Fit the explainer
explainer = shap.Explainer(model.predict, X_test)

# Calculates the SHAP values - It takes some time
shap_values = explainer(X_test)

# Get the raw values and average them
vals = np.abs(shap_values.values).mean(0)

# Create a simple table
shap_table = pd.DataFrame(
{'Features': list(X_test.columns),
'Shap value': list(vals)})

# Sort the table, as GPT-3 is not that good with numbers
shap_table = shap_table.sort_values(by=["Shap value"], ascending=False)

# Round
shap_table = shap_table.round(3)

3. Create a function that calls a GPT-3 model

This code imports the “openai” library and defines a function named “trigger_gpt”. The function takes one argument “prompt”, which is a string that serves as the input for the AI model. In the function, an API key is specified to access the OpenAI API. Then, the model to be used is specified as “text-davinci-003”. Finally, the function returns the response generated by the AI model.

import openai

def trigger_gpt(prompt):
# Set your API key
openai.api_key = "YOUR_KEY"

# Set the model to use
model_engine = "text-davinci-003"

# Generate a response
completion = openai.Completion.create(
engine=model_engine,
prompt=prompt,
max_tokens=1024,
temperature=0.5,
top_p=1,
frequency_penalty=0,
presence_penalty=0)

# Return the response
return str(completion.choices[0].text)

4. Compute the results!

This code creates two prompts that request the OpenAI GPT-3 language model to explain the results of two different data sets in a way that is easy for non-technical business people to understand. The first prompt is related to the “shap values” which is passed in as the argument shap_table, and the second prompt is related to the model quality metrics, which are passed in as the arguments MSE, MAE, and RMSE. Both prompts are passed to the trigger_gpt function.

# Create and compute prompts 
prompt1 = f"Interpret the following shap values in comprehensive way to business people without making it technical: {shap_table}"
prompt2 = f"Briefly interpret the following model quality metrics in comprehensive way to business people without making it technical on how the model is performing: {MSE, MAE, RMSE}"

# Display the results
print("Conclusions on the model its Qualitative Metrics: " + trigger_gpt(prompt1) + trigger_gpt(prompt2))

Voila! One step closer towards AI Democratization!

*This blog builds further and is an implementation of ideas demonstrated here: AutoML + GPT-3: a Match made in Heaven for Data Science Success

❤️ If you found this article helpful, I’d be grateful if you could follow me on Medium and give it a clap or two. Your support means a lot to me. Thank you!

Check out how you can build out a LinkedIn Post Generator Web App in my next blog:

Enjoy!
Alexandre t’Kint

--

--