Practical Prompting: A Comparison Between ChatGPT And Perplexity AI

Tom Welsh
24 min readDec 11, 2023

--

Photo by KOMMERS on Unsplash

Having read John Loewen, PhD great article on Prompting ChatGPT for Python Code Generation: An Effective Framework I thought I would put it to the test with a comparison of ChatGPT and Perplexity AI. This is a long article with copious amounts of commentary from ChatGPT and Perplexity AI systems.

A Bit Of Background

In the interests of openness, I would first like to point out this disclaimer:
I am interested in the Generative AI field, I have been using various GPT based models over the past year. I have paid subscriptions to Perplexity AI, and more recently ChatGPT. I also use GitHub Copilot in some of the projects I work on. I do not write paid reviews and this is my own impartial comparison between the two services.

The Challenge

For the purposes of this article I wish to carry out some rudimentary data analysis on a dataset from Kaggle. I have chosen “Massive Missile Attacks on Ukraine” as my dataset. I use an augmented version of this at he end of this article.

I will use the same prompts in for both ChatGPT and Perplexity AI. The returned prompt data will then be fed into 2 Jupyter notebooks and the results compared.

With this in mind I have created a new folder in Jupyter Notebook containing two notebooks ChatGPT.ipynb and Perplexity AI.ipnb.

Lets Begin

Get the dataset

Kaggle provides a large and varied sets of data. Everything from the Top 10 Million Passwords to Zoo Animal Classification . From this massive corpus of data I have downloaded the Massive Missile Attacks on Ukraine dataset. . From this dataset I have extracted the missile_attacks_daily.xlsx file and placed it in my Notebook folder along with my two notebooks.

Jupyter folder — Image by author

Suggested Steps

In Johns article, he indicated 5 steps for effective prompting in ChatGPT.

The steps were as follows

  1. Clearly Define Your Task
  2. Break Down Complex Tasks
  3. Leverage Python Libraries
  4. Use Appropriate Language constructs
  5. Ask for explanations

Reading Johns’ article, you can see that these 5 steps are intertwined. When breaking down your complex tasks, we would want to clearly define that task using appropriate language, and leverage python libraries whenever there was a need. I always ask for explanations to the generated code, as John states it is also a good learning mechanism.

I would also add two more items to Johns list. These would be

  1. Ask the AI to add comments
  2. Ask the AI to generate Python docstrings

For both of these comparisons I am starting with a new thread on both ChatGPT and Perplexity AI.

Let’s dive in

Using the above steps, I have defined our fist prompt as follows

Our Prompt

Using a jupyter notebook, I wish to analyse a dataset that is stored in an 
excel spreadsheet.

This spreadsheet is located in the current notebook
directory. The data consists of 10 columns in the spreadsheet.
The column headers are time_start, time_end, model, launch_place, target,
target_details, carrier, launched, destroyed, source

I will want to use Plotly as my graphing tool when I wish to visualize my
findings. Please provide comments in the code and provide an explanation of
what the generated code does

ChatGPT

From our initial prompting, ChatGPT comes back with the following information.

“To analyze and visualize your dataset in a Jupyter notebook using Plotly, you will need to follow a series of steps. Below is an outline of the process, accompanied by comments in the code to explain each step:

  1. Import Libraries: Load necessary Python libraries for handling data and plotting.
  2. Read the Excel File: Use pandas to read the data from the Excel spreadsheet.
  3. Data Inspection: Briefly inspect the data to understand its structure and content.
  4. Data Analysis: Perform any required data manipulation or analysis.
  5. Data Visualization: Use Plotly to create visualizations based on your findings.”

Chat GPT then writes the code for us which can be easily copied into our Jupyter Notebook.

# Step 1: Import Libraries
import pandas as pd
import plotly.express as px

# Step 2: Read the Excel File
# Replace 'your_spreadsheet.xlsx' with the actual file name
df = pd.read_excel('your_spreadsheet.xlsx')

# Step 3: Data Inspection
# Display the first few rows of the dataframe to understand its structure
print(df.head())

# Step 4: Data Analysis
# Perform your analysis here. This might include grouping, filtering, or calculating statistics.
# Example: df_grouped = df.groupby('column_name').mean()

# Step 5: Data Visualization
# Create a plot with Plotly. This is an example of a bar chart.
# Replace 'column_x' and 'column_y' with the column names you want to plot.
# Example: fig = px.bar(df, x='column_x', y='column_y')

# To display the plot in a Jupyter notebook, use:
# fig.show()

# Note: The above examples are placeholders. You'll need to modify the analysis and plotting code based on your specific dataset and the insights you wish to extract.

And finally we get an explanation of the code and a reminder that this is a general template and we will need to modify it for our specific dataset.
“Explanation of the Code

  1. Import Libraries: This part imports pandas, which is a powerful library for data manipulation, and plotly.express, a high-level API for creating interactive charts.
  2. Read the Excel File: Here, we use pandas to read the Excel file. You should replace 'your_spreadsheet.xlsx' with the actual name of your Excel file.
  3. Data Inspection: This code prints the first few rows of the dataframe. It’s a quick way to get a glimpse of your data, including column names and values.
  4. Data Analysis: In this step, you would perform the analysis relevant to your data. This might include grouping data, filtering rows, or calculating statistical measures. The example code is commented out as it will vary greatly depending on your specific analysis needs.
  5. Data Visualization: Here, you’ll create visualizations using Plotly. The example shows how to create a bar chart, but you should replace this with the type of chart that best suits your data and what you’re trying to convey.

Remember, you’ll need to adapt the analysis and visualization steps to fit your specific dataset and objectives. The provided code is a general template to get you started.”

Perplexity AI

From our initial prompting, Perplexity AI comes back with the following information.

Perplexity AI cites its sources for the information it gathered

Perplexity Cited Sources — Image returned in Perplexity Prompt

“Answer

To analyze an Excel spreadsheet using a Jupyter notebook, you can use the pandas library to load the data and Plotly for visualization. Here's a step-by-step guide:

  1. Import necessary libraries: Import pandas for data manipulation and Plotly for data visualization.
import pandas as pd
import plotly.express as px

2. Load the Excel file: Use pandas’ read_excel function to load the Excel file into a DataFrame. The file should be in the same directory as your Jupyter notebook.

df = pd.read_excel('filename.xlsx')  # replace 'filename.xlsx' with your file name

3. Inspect the data: Use the head function to view the first few rows of the DataFrame.

df.head()

4. Data Analysis: Perform your data analysis. This could involve cleaning the data, handling missing values, creating new columns, etc.

5. Data Visualization: Use Plotly to create interactive plots. For example, to create a bar plot of the ‘model’ column:

fig = px.bar(df, x='model')
fig.show()

Replace ‘model’ with the column you want to visualize. You can create different types of plots (scatter, line, histogram, etc.) depending on your analysis needs.

Remember to replace ‘filename.xlsx’ with the actual name of your Excel file. The code assumes that the Excel file is in the same directory as your Jupyter notebook. If it’s in a different directory, you’ll need to specify the full path to the file.

This code will load your Excel data into a pandas DataFrame, which is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). You can then use pandas’ data manipulation features to analyze the data, and Plotly’s visualization features to create interactive plots(1)(2)(3)”

Initial Analysis of our First prompt

ChatGPT provides a breakdown of the tasks at hand and then proceeded to provide the a single code block of code to carry out the tasks.

It completed the task by providing a fairly comprehensive explanation of the code.

Perplexity AI starts with a list of sources it based its findings. I find this quite useful. Perplexity AI then broke down the task into its constituent parts, and provided separate code blocks for each section. The final paragraph explains what you have been provided with.

Now to the Notebooks

ChatGPT and Perplexity AI both provide the same suggestions for the first two points as you would expect.
On Point 3 ChatGPT suggested print(df.head()) to view our data whereas Perplexity AI used the more common df.head(). It seems like ChatGPT, even though it had already acknowledged that we were using Jupyter did not provide a Jupyter friendly version of df.head.

print(df.head()) from ChatGPT — Image by Author
print(df.head()) as suggested by ChatGPT — image by Author
df.head() suggested by Perplexity AI — Image by author
Much more readable, Jupyter friendly df.head() suggested by Perplexity AI— Image by author

This is a simple issue, and not a major show stopper.
ChatGPT then provides an example of some possible data analysis by grouping some data together and returning a mean() value. This didn't actually work as the data, had not been cleaned or normalized yet. Perplexity AI did not suggest any data analysis steps to carry out.

Finally, our two contenders provide us with a graph to display.
ChatGPT opted for a Plotly bar chart to display x and y values. I modified it as we have not actually looked at the data yet. I used the same bar chart declaration as used by Perplexity AI and we get the chart below.

Plotly Bar Chart showing various models from our data set. Image by author
Plotly Bar Chart showing various models from our data set. Image by author

That’s the end of our first prompts output. We should now go and concentrate on specific tasks. The first of which should be a bit of data cleanup.

Data Cleanup of our opening spreadsheet

As we all know if you put ‘garbage in’ to a system you will inevitably get ‘garbage out’

Lets have our AI assistants help us analyse and clean our data.

Our Prompt

Analyse our dataframe df, and describe what you find. I am particularly 
interested in data types, blank cells and incomprehensible data.

provide python code to clean up the data frame.

Please provide comments in the code and provide an explanation of
what the generated code does

Both systems came up with similar conclusions to our prompt.

ChatGPT

“To analyze the dataframe df for data types, blank cells, and incomprehensible data, and then clean it up, we'll follow these steps:

  1. Data Inspection: Identify data types and look for any anomalies.
  2. Missing Data Handling: Deal with blank or missing cells.
  3. Data Cleaning: Address any incomprehensible or incorrect data.

Below is the Python code for these steps, with comments explaining each part:

# Step 1: Data Inspection
# Print the data types of each column and the first few rows of the dataframe
print("Data Types:\n", df.dtypes)
print("\nFirst Few Rows:\n", df.head())

# Check for any missing values in each column
print("\nMissing Values:\n", df.isnull().sum())

# Step 2: Missing Data Handling
# Fill missing values or drop rows/columns with missing values
# Example: df.fillna(0, inplace=True) # Fills missing values with 0
# Example: df.dropna(inplace=True) # Drops any rows with missing values

# Step 3: Data Cleaning
# Correct or remove incomprehensible data
# This step is highly specific to your data. You might need to replace values, drop erroneous rows, etc.
# Example: df['column_name'] = df['column_name'].replace('old_value', 'new_value')
# Example: df = df[df['column_name'] > 0] # Keeps rows where values in 'column_name' are greater than 0

# Note: The above code contains examples and might need modification based on the specific issues in your data.

Explanation of the Code

  1. Data Inspection:
  • df.dtypes prints the data types of each column. This helps in understanding what kind of data each column holds (e.g., numeric, string, datetime).
  • df.head() displays the first few rows of the dataframe, providing a quick glance at the actual data.
  • df.isnull().sum() calculates the number of missing values in each column, highlighting any columns with a significant number of blanks.

2. Missing Data Handling:

  • The code provided includes two examples: one that fills missing values with a default value (e.g., 0) and another that drops rows with any missing values. The appropriate method depends on the context of your data and analysis needs.

3. Data Cleaning:

  • This step is highly data-specific. The examples provided include replacing values in a column and filtering the dataframe based on a condition. The actual operations would depend on the types of anomalies found in your data.

Remember, the effectiveness of data cleaning greatly depends on the specific characteristics and requirements of your dataset. The provided code should be adapted based on the findings from your initial data inspection.”

Perplexity AI

Perplexity AI always cites its sources — Image by Author
Perplexity AI always cites its sources — Image by Author

To analyze the DataFrame df, we can use various methods provided by pandas. Here's a step-by-step guide:

  1. Inspect the data types of each column: Use the dtypes attribute to get the data type of each column.
print(df.dtypes)

2. Check for missing values: Use the isnull method followed by sum to get the number of missing values in each column.

print(df.isnull().sum())

3. Check for incomprehensible data: This step depends on the nature of your data. For example, you might want to check for values that don’t make sense in the context of a particular column.

To clean up the DataFrame:

  1. Handle missing values: Depending on the nature of your data, you might want to fill missing values with a specific value, the mean, median, or mode of the column, or you might want to drop rows or columns with missing values. Here’s how to fill missing values with the mean of the column:
df.fillna(df.mean(), inplace=True)

2. Convert data types: If a column has the wrong data type, you can convert it using the astype method. For example, to convert a column to datetime:

df['time_start'] = pd.to_datetime(df['time_start'])
df['time_end'] = pd.to_datetime(df['time_end'])

3. Handle incomprehensible data: Again, this depends on the nature of your data. You might want to replace certain values, drop certain rows or columns, etc.

Remember to replace ‘time_start’ and ‘time_end’ with the actual column names you want to convert.

This code will help you understand the structure of your DataFrame, including the data types of each column and the presence of any missing values. It will also clean up your DataFrame by handling missing values and converting data types as necessary”

Perplexity AI also suggests possible related content yopu could view — Image by Author
Perplexity AI also suggests possible related content you could view — Image by Author

Initial analysis of our second prompt

Both ChatGPT and Perplexity AI provide very similar output for this section. I think I prefer the ChatGPT output here as it actually comments out the code that may change your data pointing out that there are decisions to be made.

On the point of datatypes and missing data the output from both programs is similar. ChatGPT just added a text header to its output, whilst perplexity did not.

ChatGPT and Perplexity AI — similarities of output — Image by Author
Similar data from both AI. Primarily as this is Pandas output — Image by author

Comparing Launched versus destroyed models

Lets see if ChatGPT and Perplexity AI can make a barchart of models launched and destroyed.

Our Prompt

Using our dataframe df, and using Plotly as our charting library, I want you 
to plot on a grouped bar graph the launched and destroyed columns.

Ignore any reference to Shahed-136/131 in our charts.

Provide Python code that can be used in a jupyter notebook to display these
charts

Please provide comments in the code and provide an explanation of
what the generated code does

This time ChatGPT decided to use the more complex, but ultimatly more configurable Plotly graphic objects library, whereas Perplexity AI elected to stick with plotly.express and use meld to pull the columns together.

“ChatGPT

To create a grouped bar graph using Plotly to display the ‘launched’ and ‘destroyed’ columns from your dataframe df, while excluding any references to "Shahed-136/131", the following steps will be implemented in the code:

  1. Import Plotly: Import the necessary Plotly library for plotting.
  2. Filter Data: Exclude data related to “Shahed-136/131” from the dataframe.
  3. Prepare Data for Plotting: Group the data by a suitable column (e.g., model, launch_place, etc.) for the ‘launched’ and ‘destroyed’ columns.
  4. Create Grouped Bar Chart: Use Plotly’s bar graph object to create the chart.
  5. Display the Chart: Render the chart in the Jupyter Notebook.

Here’s the Python code with comments for clarity:

# Step 1: Import Plotly
import plotly.graph_objects as go

# Step 2: Filter Data
# Exclude rows where 'model' is 'Shahed-136/131'
df_filtered = df[df['model'] != 'Shahed-136/131']

# Step 3: Prepare Data for Plotting
# Group the data by a suitable category (e.g., 'model')
grouped_data = df_filtered.groupby('model')[['launched', 'destroyed']].sum().reset_index()

# Step 4: Create Grouped Bar Chart
fig = go.Figure()

# Adding 'launched' data to the chart
fig.add_trace(go.Bar(
x=grouped_data['model'],
y=grouped_data['launched'],
name='Launched',
marker_color='blue' # Color for the 'Launched' bar
))

# Adding 'destroyed' data to the chart
fig.add_trace(go.Bar(
x=grouped_data['model'],
y=grouped_data['destroyed'],
name='Destroyed',
marker_color='red' # Color for the 'Destroyed' bar
))

# Customize chart layout
fig.update_layout(
title='Launched vs Destroyed by Model (Excluding Shahed-136/131)',
xaxis_tickangle=-45,
xaxis_title='Model',
yaxis_title='Count',
barmode='group',
legend_title='Type'
)

# Step 5: Display the Chart
fig.show()

# Note: This code assumes that the 'model' column is the best for categorization. If another column is more appropriate for your data, adjust the 'groupby' method accordingly.

Explanation of the Code

  1. Import Plotly: The plotly.graph_objects module is used for creating complex plots.
  2. Filter Data: The dataframe is filtered to exclude rows where the ‘model’ column has the value “Shahed-136/131”.
  3. Prepare Data for Plotting: The filtered data is then grouped by the ‘model’ column (or another suitable category), and the sum of ‘launched’ and ‘destroyed’ for each group is calculated.
  4. Create Grouped Bar Chart:
  • A figure object is initialized.
  • Two bar traces (one for ‘launched’ and one for ‘destroyed’) are added to the figure, with different colors for clarity.
  • The layout of the figure is updated to include a title, axis labels, tick angle for better readability, and set the bars in a grouped layout.

5. Display the Chart: The fig.show() command renders the chart in the Jupyter Notebook.

This code will generate a grouped bar chart that visualizes the ‘launched’ and ‘destroyed’ counts for each model, excluding any data related to “Shahed-136/131”.

Perplexity AI

As always Perplexity starts off by providing its sources

Perplexity sources for its answer — Image by author
Perplexity sources for its answer — Image by author

Answer

To create a grouped bar chart excluding ‘Shahed-136/131’ from the ‘model’ column, you can filter the DataFrame before creating the chart. Here’s how you can do it:

import plotly.express as px

# Filter the DataFrame to exclude 'Shahed-136/131'
df_filtered = df[df['model'] != 'Shahed-136/131']

# Create a DataFrame that 'melts' the filtered DataFrame to have 'launched' and 'destroyed' in the same column
df_melt = df_filtered.melt(id_vars='model', value_vars=['launched', 'destroyed'])

# Create the grouped bar chart
fig = px.bar(df_melt, x='model', y='value', color='variable', barmode='group', title='Launched vs Destroyed')

# Display the figure
fig.show()

Then, it creates a grouped bar chart using the px.bar function. The x parameter is set to 'model', the y parameter is set to 'value', and the color parameter is set to 'variable'. The barmode parameter is set to 'group' to create a grouped bar chart.

Finally, it displays the figure using the show method.

This will create a grouped bar chart where each group represents a model (excluding ‘Shahed-136/131’), and each bar within a group represents the number of launched or destroyed items for that mode(1)(2)

Initial analysis of our third prompt

Code produced by both ChatGPT and Perplexity AI, even though they used different libraries produced similar results..

ChatGPT code generated Graph using Plotly Graphic Objects

Launched vs Destroyed — Crated by ChatGPT code — Image by author
Launched vs Destroyed — Crated by ChatGPT code — Image by author

Perplexity AI code generated Graph using Plotly Express

Launched vs Destroyed — Crated by Perplexity AI code — Image by author
Launched vs Destroyed — Crated by Perplexity AI code — Image by author

I definitely like the crispness of the graphics objects chart , but I know deep down that I could have modified the Plotly express code to get a similar result. But that’s not the point of this comparison. its about seeing what comes out when you don’t modify too much

Lets see map it

Finally we will map some data on a Plotly scatter_mapbox plot map. We will map launch and target from our dataset. Bear in mind this is an incomplete dataset and not all data has launch and target data.

DISCLAIMER

I have modified the original dataset so that the launch_place and target now have a single value and not multiple locations.
This makes the data no longer valid for assesing missile/uav strikes on Ukraine. The dataset we will be moving forward with is called ‘missile_attacks_daily_augmented.xlsx’.

Our Prompt

Create a Plotly scatter_mapbox plot map mapping the launch_place and target on
the map.
The launch_place should be denoted by the colour blue
The target should be denoted by the colour red.
The hover_text in the hover_box on the plotly scatter_mapbox should show the
number of models that were launched from that site and the model.
I have a mapbox_api token which I will provide when you have generated the
code.
Add a CONSTANT in the code called MAP_BOX_TOKEN which I will add the api key
to.
launch_place coordinates are in l_lat and l_long
target coordinates are in teh t_lat and t_long columns

This was a good test. The first issue both ChatGPT and Perplexity AI came across was the need to have latitude and logitude coordinates for each location in our target and launch_places.

Time to fix that now.

To do that we will have to append latitude and longitude co-ordinates to our dataframe. This also means we will have to take a closer look at the launch_place and target columns in our dataframe to see what unique values are there.

Our Prompt

Create a pandas dataframe from the excel file 
'missile_attacks_daily_augmented.xlsx' located in the notebook directory the
excel file has 10 columns - time_start, time_end, model, launch_place,target,
target_details,carrier, launched, destroyed, source. We need to add a latitude
and longitude column for both launch_place called l_lat, l_long, and target
called t_lat and t_long

ChatGPT

“I’ve provided a template code for creating a DataFrame from an Excel file and adding latitude and longitude columns. Here’s the code:

import pandas as pd

# Dummy file name (to be replaced by the user with the actual file name)
file_path = 'test_excel.xlsx'

# Template code to load the file and add latitude and longitude columns
try:
df = pd.read_excel(file_path)

# Adding latitude and longitude columns for both launch_place and target
# Initialize these columns with default values (e.g., None or suitable default values)
df['l_lat'] = None # Latitude for launch_place
df['l_long'] = None # Longitude for launch_place
df['t_lat'] = None # Latitude for target
df['t_long'] = None # Longitude for target

# Display the first few rows to verify the new columns
print("DataFrame with new columns for latitude and longitude:")
print(df.head())
except FileNotFoundError:
print(f"File '{file_path}' not found. Please ensure the correct file is located in the notebook directory and try again.")

# Note: This is a template code. The user needs to replace 'test_excel.xlsx' with the actual file name.
# The latitude and longitude values are initialized with None and should be populated with actual data as needed.

Before running this code in your Jupyter Notebook, replace 'test_excel.xlsx' with the actual filename of your Excel file. The new columns l_lat, l_long, t_lat, and t_long are initialized with None. You will need to populate these columns with the actual latitude and longitude values according to your dataset's needs”

Perplexity AI

As always, Perplexity Ai starts with its sources — Image by author
As always, Perplexity Ai starts with its sources — Image by author

To create a pandas DataFrame from an Excel file and add latitude and longitude columns for both ‘launch_place’ and ‘target’, you can use the read_excel function provided by pandas. Here's how you can do it:

import pandas as pd

# Read the Excel file into a DataFrame
df = pd.read_excel('missile_attacks_daily_augmented.xlsx')

# Add latitude and longitude columns for launch_place and target
# Replace 'launch_place_lat', 'launch_place_lon', 'target_lat', and 'target_lon' with the actual data
df['l_lat'] = df['launch_place_lat']
df['l_long'] = df['launch_place_lon']
df['t_lat'] = df['target_lat']
df['t_long'] = df['target_lon']

# Display the DataFrame
print(df)

This code first reads the Excel file ‘missile_attacks_daily_augmented.xlsx’ into a DataFrame using the read_excel function. It then adds four new columns to the DataFrame: 'l_lat' and 'l_long' for the latitude and longitude of the launch_place, and 't_lat' and 't_long' for the latitude and longitude of the target. The values for these new columns are taken from the 'launch_place_lat', 'launch_place_lon', 'target_lat', and 'target_lon' columns of the DataFrame. Finally, it prints the DataFrame.

Please replace ‘launch_place_lat’, ‘launch_place_long’, ‘target_lat’, and ‘target_long’ with the actual column names for the latitude and longitude of the launch_place and target in your DataFrame

Analysis or creating new columns in our dataframe

Running the ChatGPT provided code resulted in our 4 lat and long columns being added to the dataframe. All set with the None type, but ready to be augmented with correct Lat and Long details in the next steps.

Unfortunately the Perplexity AI answer fails in the notebook because of a python KeyError: launch_place_lat as we have not actually added the lat and longs yet. This should be corrected in the next section

Generating Latitude and Longitude for launch_place and target

We will now update our dataframe with latitude and longitude coordinates for the launch_place and target.

Create a function that will scan our dataframe row by row and looking at the 
location ofthe launch_place it will update the rows l_lat and l_long co-ordinates
respectivly.
it should also do this for the target and update the t_lat and t_long co-ordinates
respectivly

ChatGPT

“The code provided includes the integration of the geopy.geocoders library to dynamically obtain latitude and longitude coordinates for each launch_place and target in your DataFrame. However, the execution of the code resulted in a FileNotFoundError because the specified Excel file ('your_excel_file.xlsx') is not present in the current environment.

Here’s a summary of how the code works:

  1. Initialize Geocoder: The Nominatim geocoder from geopy.geocoders is initialized with a user agent (in this case, "geoapiExercises").
  2. Define a Function for Coordinates: The function get_coordinates takes a place name as input and uses the geocoder to fetch the latitude and longitude. If the location is found, it returns these values; otherwise, it returns None.
  3. Load Data and Initialize Columns: The code attempts to load the data from an Excel file and initializes new columns (l_lat, l_long, t_lat, t_long) for the latitude and longitude of the launch places and targets.
  4. Iterate and Update Coordinates: For each row in the DataFrame, the code updates the new columns with the coordinates obtained from the get_coordinates function.
import pandas as pd
from geopy.geocoders import Nominatim

# Replace this with the actual file name
file_path = 'your_excel_file.xlsx'

# Initialize geocoder
geolocator = Nominatim(user_agent="geoapiExercises")

# Function to get latitude and longitude for a given place
def get_coordinates(place_name):
location = geolocator.geocode(place_name)
if location:
return location.latitude, location.longitude
else:
return None, None

# Load the data
try:
df = pd.read_excel(file_path)

# Initialize new columns for coordinates
df['l_lat'] = None
df['l_long'] = None
df['t_lat'] = None
df['t_long'] = None

# Update coordinates for each row
for index, row in df.iterrows():
# Update launch_place coordinates
l_lat, l_long = get_coordinates(row['launch_place'])
df.at[index, 'l_lat'] = l_lat
df.at[index, 'l_long'] = l_long

# Update target coordinates
t_lat, t_long = get_coordinates(row['target'])
df.at[index, 't_lat'] = t_lat
df.at[index, 't_long'] = t_long

# Display the first few rows to verify the updated DataFrame
print(df.head())

except FileNotFoundError:
print(f"File '{file_path}' not found. Please ensure the correct file is located in the notebook directory and try again.")

To use this code:

  1. Replace 'your_excel_file.xlsx' with the actual path to your Excel file.
  2. Ensure that geopy is installed in your environment. If not, you can install it using pip: !pip install geopy.
  3. Run the code in your Jupyter Notebook.

If you have the Excel file ready to be processed, you can upload it here, and I’ll execute the code accordingly”

Perplexity AI

Perplexity citing its sources — Image by author
Perplexity citing its sources — Image by author

To update the ‘l_lat’, ‘l_long’, ‘t_lat’, and ‘t_long’ columns with the latitude and longitude of the ‘launch_place’ and ‘target’, you would typically use a geocoding service or API that can convert place names to coordinates. Here’s an example using the geopy library’s Nominatim geocoder:

from geopy.geocoders import Nominatim

# Initialize the geocoder
geolocator = Nominatim(user_agent="geoapiExercises")

# Define a function to get latitude and longitude from a place name
def get_lat_lon(place):
location = geolocator.geocode(place)
if location is not None:
return location.latitude, location.longitude
else:
return None, None

# Update the 'l_lat' and 'l_long' columns with the latitude and longitude of the 'launch_place'
df['l_lat'], df['l_long'] = zip(*df['launch_place'].apply(get_lat_lon))

# Update the 't_lat' and 't_long' columns with the latitude and longitude of the 'target'
df['t_lat'], df['t_long'] = zip(*df['target'].apply(get_lat_lon))

This code first initializes a Nominatim geocoder. It then defines a function get_lat_lon that takes a place name and returns its latitude and longitude using the geocoder's geocode method. If the geocoder can't find the place, it returns None for both latitude and longitude.

Next, it applies this function to the ‘launch_place’ column to get the latitude and longitude, and assigns the results to the ‘l_lat’ and ‘l_long’ columns. It does the same for the ‘target’ column, assigning the results to the ‘t_lat’ and ‘t_long’ columns.

Please note that the accuracy of the coordinates depends on the accuracy of the geocoding service, and that geocoding large amounts of data may be subject to usage limits or require an API key(1)(2)(3)

Analysis of ChatGPT and Perplexity Output

Both ChatGPT and Perplexity AI provided working code that updates our lat and long data for launch_place and target. Examining both code output they both used a similar method to generate this data.
Prompting effectively really helped here with the output and enables us to get it right on first pass.

Having run the code Pandas provides us with the following updated datafeame

Pandas dataframe showing completed columns for lat and long — Image by author
Pandas dataframe showing completed columns for lat and long — Image by author

Now all that is left is to make that mapbox_scatter plot map
Time to revisit our original prompt on this subject.

Our Prompt

Create a Plotly scatter_mapbox plot map mapping the launch_place and target on
the map.
The launch_place should be denoted by the colour blue
The target should be denoted by the colour red.
The hover_text in the hover_box on the plotly scatter_mapbox should show the
number of models that were launched from that site and the model.
I have a mapbox_api token which I will provide when you have generated the
code.
Add a CONSTANT in the code called MAP_BOX_TOKEN which I will add the api key
to.
launch_place coordinates are in l_lat and l_long
target coordinates are in teh t_lat and t_long columns

ChatGPT

To create a Plotly scatter_mapbox plot that maps launch_place and target locations from your DataFrame, with different colors for each and hover text showing details about the models launched from those sites, you can follow these steps:

  1. Import Plotly: Import necessary libraries for plotting.
  2. Set Up Mapbox Access Token: Define a constant for the Mapbox access token.
  3. Prepare Data for Mapping: Separate the data for launch_place and target and prepare the hover text.
  4. Create Scatter Mapbox Plot: Build the map with two sets of markers — one for launch_place and another for target.
  5. Display the Map: Render the map in your Jupyter Notebook.

Here’s the Python code for these steps:

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# Replace this with the actual file name
file_path = 'your_excel_file.xlsx'

# Mapbox Access Token
MAPBOX_TOKEN = 'your_mapbox_api_token' # Replace with your Mapbox API token

# Load the data
df = pd.read_excel(file_path)

# Set the access token
px.set_mapbox_access_token(MAPBOX_TOKEN)

# Prepare Data for Mapping
# Assuming your DataFrame has the coordinates in 'l_lat', 'l_long', 't_lat', 't_long' columns
launch_places = df[['launch_place', 'l_lat', 'l_long', 'model']].drop_duplicates()
targets = df[['target', 't_lat', 't_long', 'model']].drop_duplicates()

# Create hover text
launch_places['hover_text'] = launch_places['model'] + " - " + launch_places['launch_place']
targets['hover_text'] = targets['model'] + " - " + targets['target']

# Create Scatter Mapbox Plot
fig = go.Figure()

# Add launch_place markers
fig.add_trace(go.Scattermapbox(
lat=launch_places['l_lat'],
lon=launch_places['l_long'],
mode='markers',
marker=go.scattermapbox.Marker(size=9, color='blue'),
text=launch_places['hover_text'],
hoverinfo='text'
))

# Add target markers
fig.add_trace(go.Scattermapbox(
lat=targets['t_lat'],
lon=targets['t_long'],
mode='markers',
marker=go.scattermapbox.Marker(size=9, color='red'),
text=targets['hover_text'],
hoverinfo='text'
))

# Customize map layout
fig.update_layout(
mapbox_style="light",
mapbox_zoom=3,
mapbox_center={"lat": df['l_lat'].mean(), "lon": df['l_long'].mean()}
)

# Display the Map
fig.show()

Explanation:

  1. Import Plotly: The required libraries for data handling and visualization are imported.
  2. Set Up Mapbox Access Token: A placeholder for the Mapbox access token is defined. You’ll need to replace 'your_mapbox_api_token' with your actual token.
  3. Prepare Data for Mapping: The DataFrame is separated into two parts, one for launch_place and another for target, with their respective coordinates.
  4. Create Scatter Mapbox Plot:
  • Two types of markers are added: one set for launch_place (in blue) and another for target (in red).
  • The hover text is set to display the model and the place name

5. Customize and Display Map: The map’s layout is customized, and it is displayed using fig.show().

Ensure you have the required plotly and pandas libraries installed in your Python environment. Replace 'your_excel_file.xlsx' and 'your_mapbox_api_token' with the actual file path and your Mapbox API token before running the code.

Perplexity AI

Perplexity AI citing Sources — Image by author
Perplexity AI citing Sources — Image by author

To create a Plotly scatter_mapbox plot mapping the launch_place and target on the map using the latitude and longitude columns (‘l_lat’, ‘l_long’, ‘t_lat’, and ‘t_long’), you can follow these steps:

  1. Import necessary libraries: Import Plotly Express for creating the scatter_mapbox plot.
MAP_BOX_TOKEN = 'your_mapbox_token'
px.set_mapbox_access_token(MAP_BOX_TOKEN)

2. Create the scatter_mapbox plot: Create two scatter_mapbox plots, one for launch_place and one for target, and combine them using the add_trace method.

# Create a scatter_mapbox plot for launch_place
fig_launch = px.scatter_mapbox(df, lat='l_lat', lon='l_long', color_discrete_sequence=['blue'], hover_name='model', hover_data=['launched'])

# Create a scatter_mapbox plot for target
fig_target = px.scatter_mapbox(df, lat='t_lat', lon='t_long', color_discrete_sequence=['red'], hover_name='model', hover_data=['destroyed'])

# Combine the two plots
fig = fig_launch.add_trace(fig_target.data[0])

# Update the layout and display the figure
fig.update_layout(mapbox_style='carto-positron', title='Launch Places and Targets')
fig.show()

This code creates two scatter_mapbox plots, one for launch_place (blue) and one for target (red), using the latitude and longitude columns (‘l_lat’, ‘l_long’, ‘t_lat’, and ‘t_long’). The hover text shows the model and the number of launched or destroyed items for each site. The combined plot is displayed using the show method

Perplexity Ai displaying related content — Image by Author
Perplexity Ai displaying related content — Image by Author

Analysis of Mapbox code provided bu ChatGPT and Perplexity AI

The code provided by ChatGPT did not display any content when fig.show() was run. Though all code that was provided did run in Jupyter, there was only a blank image with a ledgend. Mo actuall map was output.

The code provided by Perplexity ran and displayed the mapbox as expected. I modified one element and added a zoom argument to set a default zoom level.

Perplexity AI generated code produces this map — Image bu Author
Perplexity AI generated code produces this map — Image bu Author

Yes, there are outliers that have definitely got the wrong latitude and longitude assigned, but that is a minor data cleansing issue and not for this article.

Conclusion

I found this a very interesting exercise comparing my two generative AI systems ChatGPT (GPT-4) and Perplexity AI ( GPT-4). I had been meaning to do this for a while and Johns article spurred me on.

The code generated by both systems was functional and there were no major tweaks added after the code was generated.

Using Johns framework definitly reduced the itterative approach i previously used. This made the task much easier and we were getting working code, with visable results every time. Granted some of the results were not as expected, but we saved a large amount of time bu having working code.

When it comes to charting, I like to use Plotly. ChatGPT seems to prefer to use Plotly graphic objects whereas Perplexity AI consistently stuck with Plotly Express. In the majority of cases i also prefer graphic objects as it provides more control over the output, but for “quick and dirty” charts to visualize what we are looking at plotly express can be very valuable.

I look forward to continuing exploring the world of Generative AI, and the benifits and challenges it can bring.

--

--

Tom Welsh

Python aficionado, passionate about data analysis & visualization. Tech-savvy, Cybersecurity & Risk professional by day, Linux & IT infrastructure by night.