Will Generative AI make data scientists obsolete?

Arik Friedman

Published in

Data at Atlassian

6 min readJul 30, 2023

Read on to find out:

Would ChatGPT and its Code Interpret plugin pass Atlassian’s Data Science interview?
How can data scientists leverage generative Artificial Intelligence (AI)?

I have worked as a Data Scientist at Atlassian for nearly eight years. During this time, I have followed the progress in the Deep Learning and Generative AI space, but mostly from afar: my work mainly involves problems best tackled with more traditional statistical tools.

When OpenAI announced ChatGPT, they shifted the paradigm in this field in two ways: first, ChatGPT represents an extraordinary leap in model quality, bringing it to the attention of the broader public. Second, with their APIs, OpenAI has shifted the use of Large Language Models from the exclusive domain of Data Science (DS) into a more accessible engineering challenge. ChatGPT has code generation capabilities, but OpenAI also developed the Code Interpreter plugin. This plugin enhances ChatGPT by allowing it to execute the code it generates. The plugin also provides file uploading to augment the instructions to ChatGPT. Combine data files and code execution capabilities, and you got yourself a machine capable of conducting its own data analysis! Generative AI turned from a cool but adjacent field of data science to something that could apply directly to my work.

At Atlassian, not only are we looking into ways to bring the promise of Artificial Intelligence into the hands of our customers, but we’re also assessing how these new capabilities can help us improve our ways of working. Recently, OpenAI announced that their Code Interpreter plugin will become available to all ChatGPT Plus users, meaning I could get immediate access to the tool by signing up for ChatGPT Plus (thank you, Atlassian self-learning budget). Which led me to:

The challenge: will ChatGPT Code Interpreter pass the Data Science interview process?

Given the goal of understanding the applicability of ChatGPT to our Data Science work, the interview challenge provides several benefits:

✔An existing non-sensitive dataset I can upload to ChatGPT
✔ It represents our internal data and use-cases
✔ We have established evaluation benchmarks

Note that our interview processes change over time and across geographies, so what you see below might not represent what you may encounter as a Data Science candidate.

Would we hire ChatGPT Code Interpreter as a data scientist?

I ran the ChatGPT code interpreter through three tests.

Test 1: the take-home assignment: prescriptive, simple tasks

I presented to ChatGPT two questions that required simple analysis. It misinterpreted the first task and needed clarification to get it correct. In the second task, its initial code generated a division by zero error, but it debugged the problem and fixed it on its own. Sadly, ChatGPT does not sense-check its responses beyond the runtime errors: its answer was very clearly inconsistent with the data, but that didn’t raise any alarms. I also realized how difficult it would have been to spot the problem if the error wasn’t so obvious: the code looked right. When I pointed out the inconsistency, ChatGPT got the solution right, but this wouldn’t have happened without a human in the loop.

Test 2: Scenario questions

In technical interviews, we often present hypothetical business scenarios to assess the application of technical skills and analytical thinking within a concrete business context. ChatGPT was very good at covering technical aspects and considerations when presented with such scenarios, but those responses were generic and context-free. In contrast, real-world problems require local context, and we’d typically expect candidates to ask clarifying questions, understand the context before diving into solutions, and adapt their responses to that context, something that Large Language Models (LLMs) are (currently?) not very good at.

Test 3: an older take-home assignment with more open-ended questions

In the past, we had a longer take-home assignment, which was less prescriptive than the current version, and contained some open-ended questions.

For the more basic questions, ChatGPT gave a vibe of a lazy candidate with very shallow data exploration analysis. For example, one of the questions involved plotting a chart of Daily Active Users. GPT4 did that successfully but declared “a few sharp drops to zero active users” though none were in the chart. It noted the downward trend towards the end of December yet didn’t go as far as to explain why. (Holidays was a typical response from actual candidates.)

ChatGPT’s response to a prompt asking to plot and assess a trend of the number of daily active users (DAU) for the uploaded dataset

Some of the tasks were deliberately open-ended, providing candidates an opportunity to express their creativity, show technical depth, and demonstrate their business sense, yet GPT4 wasn’t inclined to explore that space — even with direct nudges, it didn’t go very far — it needed concrete instructions to guide its work.

There was a “mind-blown” moment for me, though. One of the advanced questions required identifying patterns of features being used together. GPT4 suggested market basket analysis (something we’ve seen candidates do as well). This can be time-consuming, but the LLM generated the code in a breeze. For an analyst, this is a huge time saver. There was one caveat: it didn’t actually solve the problem due to technical issues (possibly due to the dataset size coming close to the plugin’s upload file size limit).

A snippet from ChatGPT’s Market Basket analysis attempt. When things go wrong, ChatGPT apologizes a lot!

Overall assessment: no hire — while ChatGPT displayed impressive capabilities that will impact data scientists’ work, its current shortcomings in understanding context, asking clarifying questions, addressing even simple tasks without misinterpretation, and lack of sense-checking in its responses collectively contributed to the “no hire” decision.

LLMs can’t replace data scientists — at least not yet (we’ll need to keep revisiting that, as the pace of progress is high). Alongside other LLMs, ChatGPT Code Interpreter is in the danger zone where it can provide compelling responses, even when they’re incorrect. It can be a very powerful tool for people who know what they’re doing, can ask the right questions, and guide it in the right way. People who don’t have the pre-requisite knowledge and the know-how needed to evaluate its responses critically will still be able to make use of it, but there’s no guarantee that they’ll get the right answers or that they’ll be able to spot it when they don’t.

Integrating GPT into the Data Science work

The Code Interpreter might not be a viable option for some; for many organizations, uploading (potentially sensitive) data to a 3rd party is a no-go, meaning the code interpreter plugin is out of bounds. Beyond privacy issues, data scientists may be working with large amounts of data, so data access at scale and resource limitations would be an impediment even if there weren’t privacy hurdles. In the long run, given moves such as Databricks acquiring MosaicML or Snowflake partnering with Nvidia, I expect such capabilities will eventually emerge within the data scientists’ dedicated working environments, so we need to give it some time.

In the meantime, we can get much value from GPT4 capabilities, even without plugins such as the Code Interpreter. We are still exploring this space, but here are some things that some of our data scientists have already been looking into:

Natural query language: Atlassian’s announcement of Atlassian Intelligence highlighted forthcoming features such as taking natural language queries and converting them to JQL in Jira or SQL in Atlassian Analytics. Beyond that, some of our data scientists are exploring how similar capabilities can be used for our own Data Science work. Let the LLM write those pesky regular expressions for you!
Tap into knowledge bases: AI-powered Virtual agents can use existing knowledge bases and make them more accessible. We plugged our internal data science Confluence space into one of our Slack channels to provide another way to engage with documentation and past work.
GPT as a data analysis co-pilot: we are still experimenting with GPT as an assistant that supports us in our data science work. The exact way you phrase your instructions (prompts) to the model will affect the quality of the responses you get, and we are on that learning path.
Parsing existing data models and pipelines: GPT can help you parse and get faster into a complicated query/code-base that someone else (or you 😅 ) wrote.
Debugging assistant: Give it the query and the error you got, and let it solve the problem for you.

Are there other ways you’ve been using GPT for your data science work? Please share!