Chatting with Data (for Non-Coders and Coders with No Time)

Cody Sandahl
7 min readMay 22, 2024

--

A data analyst stares at complicated charts on a screen and thinks, “I don’t have time for this!”
(image generated by Leonardo AI, modified by Cody Sandahl)

I am a data-driven person. I know many data-driven people (would you like that statement graphed in a chart or analyzed in a table?). But I often find myself with mere minutes to answer a question that would take significant time investment to answer. Wouldn’t it be great if we could just retrieve the raw data, describe the analysis we want, and get an answer or chart without having to dive into esoteric platforms or code?

That, my friends, is the very essence of chatting with data. By connecting a dataset (or preferably multiple related datasets) with a generative AI chat interface, we can ask questions in a natural language format that would be accessible to project managers, business analysts, health data managers, and marketers.

Audience: Non-coders (or coders with no time) who understand their data, but who do not want to dive into code to answer their questions.

  • Project managers
  • Business analysts
  • Health data managers
  • Marketers

TL;DR Summary

UPDATE: OpenAI released an updated version of ChatGPT Data Analyst in May 2024. I am planning to do a comparison with the new version soon!

(icons generated by ChatGPT)

Overall Winner: ChatGPT Data Analyst

ChatGPT Data Analyst is incredibly flexible and reliable, creating accurate results without crashing or getting stuck on code errors. Since many people and organizations already have ChatGPT subscriptions, and since this capability is already included in all paid plans, this is the most powerful and accessible option available for our audience.

Interesting Up-and-Comer: Julius AI

Julius AI is a fascinating tool that is nipping at the heels of ChatGPT Data Analyst. It is starting to layer on some useful features that ChatGPT does not have, but it needs to figure out how to get out of code errors to be as reliable as ChatGPT. The additional cost of a Julius AI subscription will be difficult for many individuals and organizations who may already have a subscription to OpenAI, Microsoft, or Google AI tools.

Buggy (but Powerful): DataChat

DataChat has added interesting features such as built-in connectors and live dashboard-style charts that set it apart from the other tools. There are still far too many errors, empty results, and crashes to be useful for people who aren’t professional data analysts. DataChat also goes immediately from Free up to Enterprise with no plans in between.

Methodology

Dataset (Download Dataset)

Business-to-Business (B2B) sales pipeline data from a fictitious company that sells computer hardware. Data is split between multiple CSV files with relationships between them (ex: product ID to product info in a different table). Original Source: Maven Analytics

Data Questions

  • Make a graph of how each sales team is performing compared to the rest
  • Make a graph of percentage of opportunities won by each sales team
  • Can you identify any quarter-over-quarter trends?
  • Do any products have better win rates?

Tools Tested (Download the Chats as PDF)

  • PowerBI (to check the accuracy of the AI results) — Download PowerBI Report
  • ChatGPT with the Data Analyst GPT
  • DataChat
  • Julius AI

NOTE: These other tools were tested but lacked capability, reliability, or focused on coding instead of a chat or visual interface: Jeda.ai, Polymer, and Google AI Studio.

Comparison Points

  • Accuracy of results
  • Usefulness of charts
  • Ease of use for our target audience
  • Data security/privacy for analysis of sensitive datasets
  • Price

Why PowerBI?

The results from the AI data analyst will always come with incredible confidence. It’s like a colleague who always “knows” they’re right, even though they have a reputation for making mistakes. So to guard against that, we’re going to use PowerBI to confirm the accuracy of the AI reports. While all of the AI tools generated reliable results, they did not always use the exact calculation I wanted. If I did not have the PowerBI report, I probably would not have caught these issues.

The top issue concerned the definition of “win rate” or “win percentage.” Every AI tool decided to calculate the win rate by taking total wins over total opportunities. That sounds logical. But our dataset also has in-progress opportunities, and I wanted those to not count in the calculation of win rate. The AI tools were not wrong per se, but they gave me a different result than I wanted and it required an understanding of the dataset to see the issue.

The more you know your data, the more you will get out of the AI tools!

Download PowerBI Report

Comparing the Charts

Sales Team Performance

Charts for Sales Team Performance

Quarterly Results

Charts for Quarterly Results

Products

Charts for Product Performance

ChatGPT Data Analyst Pros and Cons

Download the ChatGPT Data Analyst chat

BONUS! See how well ChatGPT converted the charts to Streamlit and Plotly.

Pros

  • Data Analysis GPT has very robust capabilities without needing an extra subscription
  • If you or your organization already have a ChatGPT subscription, you get this capability without paying more
  • Shares a good amount of detail of the steps it took if you want to reproduce or check the logic
  • Ability to provide some analysis to get you started with interpretation (unlike most other tools)
  • Able to recover from coding errors by re-trying and re-phrasing (without user intervention)
  • Able to handle multiple datasets at the same time
  • Worked with and without a data dictionary (it can infer the relationships in many cases)
  • Ability to view the Python code and analysis details

Cons

  • No live connections — just static file uploads
  • Need API, Team, or Enterprise subscription level to have proper privacy for sensitive data

Data Security/Privacy

https://openai.com/enterprise-privacy/

https://openai.com/policies/privacy-policy/

Summary: Team, Enterprise, and API subscriptions have a nice data policy. Free and Plus subscriptions are not safe for sensitive data

Julius AI Pros and Cons

Download the Julius AI chat

Pros

  • Julius gives you the Python code which you can run and modify locally (with R code generation in beta). This is an interesting feature, as it would allow a non-technical user to explore data and provide a developer with a great start for future modification or automation.
  • Julius provides interesting contextual suggestions for further analysis.
  • Julius can analyze multiple datasets at the same time (but I did not test this)
  • Ability to customize model (ex: ChatGPT or Claude) and add custom instructions to the AI
  • Ability to link to a live Google Sheet instead of uploading a static file
Julius AI’s in-line suggested analysis

Cons

  • Expensive per-user ($20–70/person/month)
  • Was unable to recover once it had an error in the Python code

Data Security/Privacy

https://julius.ai/docs/privacy-policy

Julius AI’s Security Policy

DataChat Pros and Cons

Download the DataChat chat

Pros

  • Built-in data connections (not just static files)
  • Able to work with multiple datasets at the same time
  • Live dashboard charts instead of static images
DataChat’s built-in connections

Cons

  • Error-prone system
  • Advanced analysis can overwhelm the system when multiple steps are needed
  • Unable to change much of the formatting or even sorting of the charts
  • There are only options for Free and Enterprise — nothing in between

Data Security/Privacy

https://datachat.ai/terms/

License of Customer Data — In order to use the Service, Customer hereby grants to DataChat a license to access and use Customer Data to provide the Service to Customer and fulfill its obligations under this Agreement, including without limitation, accessing, storing, recording, transmitting, reproducing, maintaining, displaying and otherwise using, manipulating and/or modifying the Customer Data as necessary to provide the Service. DataChat shall also have the right to collect and analyze data and other information relating to the provision, use and performance of various aspects of the Service and related systems and technologies (including, without limitation, information concerning Customer Data and data derived therefrom), and DataChat will be free (during and after the Term hereof) to (i) use such information and data to improve and enhance the Service, and for other development, diagnostic and corrective purposes in connection with the Service and other DataChat offerings, and (ii) disclose such data solely in aggregated or other de-identified form in connection with its business. All such aggregated data collected, used, and disclosed by DataChat shall include only anonymous, non-personally identifiable information.

Summary: This is a worrying approach to customer data for an enterprise product. I would not trust these terms of use with sensitive data.

--

--

Cody Sandahl

I make things that help and delight people - including AI, laser cutting, 3D printing, and anything else that strikes my fancy.