A Conversation with chatGPT on Data Engineering

Getting a perspective on how OpenAI understands Data engineering

Koushik Thota
Towards Data Engineering
3 min readDec 9, 2022

--

What is ChatGPT in the first place?

According to Wikipedia ChatGPT is a prototype artificial intelligence chatbot developed by OpenAI that focuses on usability and dialogue. The chatbot uses a large language model trained with reinforcement learning based on the GPT-3.5 architecture.

ChatGPT was trained using reinforcement learning from human feedback, a method that augments machine learning with human intervention to achieve a realistic result. It is based on the GPT-3.5 architecture. During the training process, human trainers played the role of a user and an artificial intelligence assistant.

How did ChatGPT respond to questions on Data Engineering?

Let ChatGPT do the talking…..

Q. Do you think data engineering is different from software engineering?

Q. What is the future of Data Engineering?

Q. How would you design a large-scale data pipeline which can process millions of transactions every second?

Q. How would you handle Missing data in a Data Pipeline?

Q. Write a code to Read data from a Kafka Stream and then dump the data to S3 using Apache Spark Structured Streaming

Q. Do you think Apache Spark will be obsolete in the near future?

Q. What are some optimizations for apache spark in data pipelines?

Q. What tools and technologies would you use to design a real-time supply chain analytics data pipeline?

Q. What do you think about the new data modeling design patterns like data mesh, data vault, data lake house, and a data lake? Do you think they will replace the traditional data warehouse architecture?

Q. Can you write a data quality library to perform data quality on a spark data frame consisting of IoT data?

Q. Write a SQL query to get all the dates between a range using recursion?

Q. What are some questions you would ask to hire a data engineer?

Conclusion and Impressions

ChatGPT is ridiculous and impressive seeing the answers it has produced and I literally ran out of ideas on what questions to ask. I enjoyed conversing and was fascinated by its sheer context-remembering skills. What will it become in future Iterations?

Please comment on what questions you want me to ask next. I will make a Part 2 on this based on your responses.

Clap if you like this story, and follow me on Linkedin and Medium for more Interesting stuff. Cheers !!!

Check out my other stories as well. 😊

--

--