Using ChatGPT3 as a Data Engineer

Abhijit Menon
4 min readDec 24, 2022

--

The world has gone pretty crazy about ChatGPT3, while I was skeptical about it being useful for day-to-day data engineering use cases, that has quickly changed over the last couple of weeks. I have not only started using it but also checking out ChatGPT first before Google.

One of the major issues I had with getting into using ChatGPT3 was not even knowing where to start and what I can use it for. Here are some of the really helpful use cases I found that might kickstart some of you into using it regularly.

Building boilerplate code

Sometimes you know what you need to do, but it would just be so much more helpful to have a starting point or a boilerplate code to work on top off. You can now just explain to ChatGPT3 what you are trying to do and it will set up a boilerplate code for you to use. I find this super cool and useful.

(the code goes on, I have just truncated it)

Add comments to your code

Yes, it is good practice to add comments to your code but sometimes you’re just in the flow and you don’t stop to add comments.

One cool thing I found with ChatGPT is its ability to add comments to your code.
Here’s an example of a script I’d written a long time ago to scrape text from PDFs without any comments. I checked if ChatGPT would be able to add comments to it and it did a much better job than I probably would have.

PS: You can also have it explain a totally new code to you if you are trying to understand someone else’s work that does not have comments on them.

Debug basic errors on your scripts

Do you hate finding that one pesky little comma that you missed on your SQL causing a Syntax error? Well, ChatGPT has you covered. My input query below had a missing comma at the end of one of the columns and ChatGPT helps with figuring out what’s wrong and what needs to change.

Convert code from a different language, to the language you want to develop in

A lot of times you’re looking for a solution and you find exactly what you need…in a different language. Or there is a code base that exists in your company that you’re trying to migrate to a different language. Given that a lot of the syntax, the libraries, and the formatting is so different from one language to another transitioning the code you need can be a real pain.
ChatGPT does a great job of helping you make this transition.

Just querying —

Convert this code to {insert language you want to convert to} —
{Insert original code here}

This should give you at least a starting point to start with the transition.

Example Query to ChatGPT —

Convert to scala —

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

inputGDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://test-bucket-glue/testing-csv"]}, format = "csv")

outputGDF = glueContext.write_dynamic_frame.\
from_options(frame = inputGDF,\
connection_type = "s3", \
connection_options = {"path": "s3://test-bucket-glue/testing-output"},\
format = "parquet")

Response —

Note — I wouldn’t take the code as the absolute truth. Definitely test it out and make sure everything works. ChatGPT help setting up some boiler plate code for you to work on.

Process Flow Chart Creator

Sometimes if you’re setting up a process and you want a quick flow chart to represent it, you could get a fancy one by building it yourself on Draw.io or you could explain the process to ChatGPT and it can come up with a quick diagram for you. (In text format though)

All of these use cases are in addition to the basic question and answers that you can do with ChatGPT just like you would with Google/StackOverflow.
I’d love to know some of the other ways everyone is using ChatGPT3. I genuinely think that having this as a tool will make life as a developer easier and help us focus more on creativity and building solutions rather than focusing on some of the boring and monotonous repetitive code work that can be automated.

--

--