Let ChatGPT Teach You How to Plot with Python and Matplotlib

Sina Nazeri
The Power of AI
10 min readJan 18, 2023

--

Want to learn to plot in Python? ChatGPT will help you.

This project asks ChatGPT to teach you how to generate Python code for creating plots and visualizations. Simply provide a brief description of the plot you want to create, and ChatGPT will generate the necessary code using the popular Matplotlib library. This tool allows even users with no programming experience to develop professional-grade plots for their data. Whether you’re a researcher, data scientist, or student, ChatGPT makes it easy to create beautiful and informative plots in Python. (Generated by ChatGPT with minor changes).

The project in JupyterNotebook is available free in Cognitive Class.

The link to ChatGPT by openAI

you need to sign up to use the service

1. Let’s ask about our specific area of interest which we want to learn.

Asking :”teach me how to plot in python”
The result:

There are many ways to plot data in Python. Here is a simple example using Matplotlib, a popular data visualization library in Python:

# using the provided by chatGPT to run in the notebook
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
# Create a figure and an axes
fig, ax = plt.subplots()
# Plot the data
ax.plot(x, y)
# Add a title
ax.set_title('My Plot')
# Show the plot
plt.show()

This will create a simple line plot of the data in x and y. There are many options and customization options available for creating plots in Matplotlib. For example, you can change the appearance of the lines, change the axis labels, add a legend, etc.

Here is some more information on Matplotlib and other plotting libraries in Python:
The Matplotlib documentation:
https://matplotlib.org/stable/index.html
A tutorial on Matplotlib:
https://matplotlib.org/stable/tutorials/index.html
Seaborn, a data visualization library based on Matplotlib:
https://seaborn.pydata.org/
Plotly, a library for creating interactive plots:
https://plotly.com/python/

As you can see, ChatGPT provides introductory text and code blocks (with a copy icon for convenience), and concludes the conversation with additional useful information.

2. You can ask follow-up questions

By simplely asking “show me more examples of plots in python”. In fact, chatGPT remembers your previous conversations.
The result:

Here are a few more examples of plots that you can create in Python using the Matplotlib library: Scatter plot

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]

# Create a figure and an axes
fig, ax = plt.subplots()

# Plot the data
ax.scatter(x, y)

# Add a title
ax.set_title('My Scatter Plot')

# Show the plot
plt.show()

Bar plot

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]

# Create a figure and an axes
fig, ax = plt.subplots()

# Plot the data
ax.bar(x, y)

# Add a title
ax.set_title('My Bar Plot')

# Show the plot
plt.show()

Pie chart

import matplotlib.pyplot as plt

# Sample data
labels = ['Apples', 'Bananas', 'Oranges']
sizes = [30, 45, 25]

# Create a figure and an axes
fig, ax = plt.subplots()

# Plot the data
ax.pie(sizes, labels=labels)

# Add a title
ax.set_title('My Pie Chart')

# Show the plot
plt.show()

Keep in mind that these are just a few examples, and there are many other types of plots that you can create in Python using Matplotlib. You can find more information and examples in the Matplotlib documentation: https://matplotlib.org/stable/index.html.

Let's plot from a sample dataset

3. Suggest some sample datasets

The intial stage you need to load data. We can ask ChatGPT to suggest some sample datasets by asking Could you tell me sample dataset for beginners in python?”

The result:

Sure! Here are a few datasets that might be useful for beginners in Python:

Iris dataset: This is a small dataset that contains information about different species of iris flowers, including the sepal and petal length and width. It is often used as a beginner’s dataset for learning machine learning and data analysis techniques. You can load the Iris dataset in Python using the sklearn.datasets module.

Boston Housing dataset: This dataset contains information about housing prices in the Boston area, including the median value of homes, crime rate, and other factors. It is often used as a beginner’s dataset for learning regression techniques. You can load the Boston Housing dataset in Python using the sklearn.datasets module.

MNIST dataset: This is a dataset of handwritten digits, with 60,000 training examples and 10,000 test examples. It is often used as a beginner’s dataset for learning image classification techniques. You can load the MNIST dataset in Python using the keras.datasets module.

From these suggested datasets, we select Iris dataset to continue to work with as a sample.

3.1. Load the iris dataset as a sample to work with

We need a proper sample in tabular format, so we asked “load iris data with table looking format”
The result:

To load the Iris dataset in a table-like format in Python, you can use the Pandas library. Here is an example of how you can load the Iris dataset and display it in a table-like format:

import pandas as pd
from sklearn import datasets

# Load the Iris dataset
iris = datasets.load_iris()

# Create a DataFrame with the features and target values
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Display the DataFrame
print(df)
#--> or we can call the dataset to displace it
df

This will create a Pandas DataFrame with the features (sepal length, sepal width, petal length, and petal width) of the Iris dataset as columns, and the target values (the species of iris) as an additional column.

You can then use the usual Pandas functions to manipulate and analyze the data in the DataFrame. For example, you can use the head() function to display the first few rows of the DataFrame, or use the describe() function to get summary statistics for the numerical columns.

4. Plot the dataset

Conveniently, we can ask follow-up questions; here we asked “plot it”
The result:

To plot the Iris dataset using Matplotlib, you can use the following code:

# --> Trying of the suggestions from ChatGPT
# --> we removed the redundant lines (e.g. loading libraries and importing data) from the suggestion

# Create a figure and an axes
fig, ax = plt.subplots()

# Plot the data, coloring the points by the target values
scatter = ax.scatter(df['sepal length (cm)'], df['sepal width (cm)'], c=df['target'])

# Add a legend
legend = ax.legend(*scatter.legend_elements(), title='Species')

# Add a title
ax.set_title('Iris dataset')

# Show the plot
plt.show()

4.1. Plot-specific column

We select and ask to plot a specific column in the dataset. For example, asking “show me the distribution of the column ‘sepal length (cm)”

The result:

To show the distribution of a column in a Pandas DataFrame, you can use the hist() function of the DataFrame. For example, to show the distribution of the ‘sepal length (cm)’ column in the Iris dataset, you can use the following code:

# Create a figure and an axes
fig, ax = plt.subplots()

# Plot the distribution of the 'sepal length (cm)' column
df['sepal length (cm)'].hist(ax=ax)

# Add a title
ax.set_title('Distribution of sepal length')

# Show the plot
plt.show()

As you may notice, the plot does not have labels. Therefore, we can ask “redo it with proper axis labels”.
The result:

# Create a figure and an axes
fig, ax = plt.subplots()

# Plot the distribution of the 'sepal length (cm)' column
df['sepal length (cm)'].hist(ax=ax)

# Add axis labels
ax.set_xlabel('Sepal length (cm)')
ax.set_ylabel('Frequency')

# Add a title
ax.set_title('Distribution of sepal length')

# Show the plot
plt.show()

4.2. More complex plot

Another useful plot to show the distribution is using boxplot which enables us to visualize standard deviation and identify outliers.
Here we ask for a boxplot of all the columns showing in the single plot by
asking “show me boxplot of distributions of all columns in same plot”.
The result:

To show the distribution of all columns in a Pandas DataFrame as boxplots on the same plot, you can use the boxplot() function of the DataFrame.
For example, to show the distributions of all columns in the Iris dataset as boxplots, you can use the following code:

import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets

# Load the Iris dataset
iris = datasets.load_iris()

# Create a DataFrame with the features
df = pd.DataFrame(iris.data, columns=iris.feature_names)

# Add a column with the target values
df['target'] = iris.target

# Create a figure and an axes
fig, ax = plt.subplots()

# Plot the boxplots of all columns
df.boxplot(ax=ax)

# Add a title
ax.set_title('Distributions of all columns')

# Show the plot
plt.show()

5. Using chatGPT for your own project

ChatGPT is an excellent tool for working on your project. It is important to frame the question in a way that addresses your problem. Keep in mind that dividing your project into smaller blocks and addressing them individually is the key to addressing the project.

To ask a programming question from ChatGPT, you can simply provide a clear and concise description of your problem, along with any relevant code or error messages. You should also specify the programming language and any libraries or frameworks that you are using.

5.1. How to properly formulate a question (aka. prompt) for ChatGPT

So, let's ask ChatGPT what it thinks.

When formulating a prompt for ChatGPT, it’s important to keep a few things in mind in order to get the most out of the model:

Be specific and clear about what you want the model to do. For example, instead of asking “What do you know about this topic?”, ask “Can you provide a brief overview of the history of the topic?”

Provide context if necessary. If you’re asking a question about a specific event or person, give some background information so that the model can understand the context in which you’re asking the question.

Be concise. Long and complex prompts can be difficult for the model to understand and may result in less accurate or relevant responses.

Avoid using overly technical language or jargon that the model may not be familiar with.

Avoid asking multiple questions in one prompt, it is better to ask one question at a time.

Here are a few examples of good prompts for ChatGPT:
“Can you explain the main difference between supervised and unsupervised learning?”
“What are the main components of a neural network and how do they work?”
“Please write a short story about a robot who can read minds”
On the other hand, some examples of poor prompts would be:
“everything”
“What can you tell me?”
“what you know”
By following these guidelines, you can help ensure that you get the most accurate and relevant responses from the model when using ChatGPT.

5.2. Keep in mind as a data scientist

ChatGPT is not perfect; the example provided is from sklearn. In the real world, you need to load your own data from your resource, using Pandas (the example below shows loading iris from local storage).

In general, as a practicing data scientist, you will frequently encounter the multifaceted process of data collection, preparation, model construction, and interpretation. These stages each have their own unique requirements in terms of skills and resources. For instance, you may utilize pandas to load and manipulate your datasets. It’s important to keep in mind that the examples provided by ChatGPT are intended to serve as a foundational learning aid, and may not be suitable for handling more complex or substantial datasets.

5.3. Don’t panic chat GPT is not going to replace data scientist (yet!)

So, let's ask ChatGPT what it thinks.

There are several reasons why ChatGPT and other large language models are not going to replace data scientists:

Data scientists have a much broader skill set: Data scientists are trained in a wide range of skills, including data analysis, machine learning, programming, statistics, and more. ChatGPT and other large language models are only capable of generating text based on a given prompt, and they do not have the other skills that data scientists possess.

Data scientists can interpret and explain the results of their work: Data scientists are trained to interpret and explain the results of their work, which is critical for decision-making and communication with stakeholders. ChatGPT and other large language models are not capable of interpreting or explaining the results of their work.

Data scientists can work with a wide range of data: Data scientists are trained to work with a wide range of data, including structured and unstructured data, and they can use a variety of tools and techniques to extract insights from the data. ChatGPT and other large language models are only capable of generating text based on a given prompt, and they are not able to work with other types of data.

Data scientists can work with a wide range of stakeholders: Data scientists are trained to work with a wide range of stakeholders, including business leaders, subject matter experts, and technical experts, and they can communicate complex concepts in a clear and understandable way. ChatGPT and other large language models are not able to work with stakeholders or communicate complex concepts.

--

--

Sina Nazeri
The Power of AI

Data Scientist at IBM with broad ML skills: Classification, Clustering, CV, NLP, Generative AI. Strong academic background & research/work experience.