How To Convert Excel Files to CSV Using Bash?

Data-Sleek
4 min readApr 21, 2023
Is ChatGPT the best programer on earth?

Continuing my journey with ChatGPT, I had to figure out a way to convert Excel files with xlsx extension to CSV files in order to ingest them into Singlestore via pipeline. Excel files are stored in a S3 buckets, so the script has to detect if there are new files in the bucket, download the file, convert the file then upload the converted file back to S3.

There is AWS Glue and Lambda which I’m sure could take care of this but my knowledge in that area can be compared to the amount of air in space.

I won’t be giving the final solution, just the skeleton which i hope can help one of you one day.

Detecting A New File in S3

Not only ChatGPT writes the code for you but it makes recommendation on how to use it. What you should install, where you can find information. Crazy !

This script uses the AWS CLI, so ensure you have it installed and configured before running the script. You can find instructions on how to install and configure the AWS CLI here: https://aws.amazon.com/cli/.

#!/bin/bash

# Variables
S3_BUCKET_NAME=”your-s3-bucket-name”
LAST_CHECKED_FILE=”last_checked_file.txt”
NEW_FILES_LIST=”new_files_list.txt”

# Function to get the latest file timestamp
get_latest_timestamp() {
if [ -f “${LAST_CHECKED_FILE}” ]; then
cat “${LAST_CHECKED_FILE}”
else
echo “1970–01–01T00:00:00Z”
fi
}

# Check for new files
check_new_files() {
local last_timestamp
last_timestamp=”$(get_latest_timestamp)”

aws s3api list-objects-v2 — bucket “${S3_BUCKET_NAME}” — query “Contents[?LastModified > ‘${last_timestamp}’].{Key: Key, LastModified: LastModified}” — output text | awk -v OFS=’\t’ ‘{print $1, $2}’ > “${NEW_FILES_LIST}”

if [ -s “${NEW_FILES_LIST}” ]; then
echo “New files found in the S3 bucket:”
cat “${NEW_FILES_LIST}”
else
echo “No new files found in the S3 bucket.”
fi
}

# Update the last checked file timestamp
update_last_checked_file() {
local latest_file_timestamp
latest_file_timestamp=”$(aws s3api list-objects-v2 — bucket “${S3_BUCKET_NAME}” — query ‘sort_by(Contents, &LastModified)[-1].LastModified’ — output text)”

if [ “${latest_file_timestamp}” != “None” ]; then
echo “${latest_file_timestamp}” > “${LAST_CHECKED_FILE}”
fi
}

# Main script execution
check_new_files
update_last_checked_file

I find it outstanding that ChatGPT knows about S3 and can figure out which technology to use (AWS CLI) in order to interact with S3. Will ChatGPT be able to program itself, program machines, software? Are we on our way to Judgment Day?

Now that a new file can be detected, the next step is to download that file on a server and run the conversion script.

Convert XLSX To CSV

Again, ChatGPT shows that it can figure out what technology to use and make a suggestion to the problem.

I can provide you with a bash script that will help you convert Excel files with a .xlsx extension to CSV. To do this, we’ll use a command-line tool called in2csv which is part of the csvkit package. You'll need to have csvkit installed on your system. You can install it using pip:

The script is clean, well written and can be easily incorporated in the script above that detect a new file and process it immediately.

#!/bin/bash

# Check if a directory is provided as an argument
if [ “$#” -ne 1 ]; then
echo “Usage: $0 <directory>”
exit 1
fi

dir=$1

# Check if the provided directory exists
if [ ! -d “$dir” ]; then
echo “Error: Directory ‘$dir’ does not exist.”
exit 1
fi

# Loop through all .xlsx files in the directory
for file in “$dir”/*.xlsx; do
# Check if there are any .xlsx files in the directory
if [ ! -e “$file” ]; then
echo “Error: No .xlsx files found in the directory.”
exit 1
fi

# Extract the filename without extension
filename=$(basename — “$file”)
extension=”${filename##*.}”
filename=”${filename%.*}”

# Convert the .xlsx file to a .csv file
in2csv “$file” > “${dir}/${filename}.csv”

# Print the conversion status
echo “Converted ‘$file’ to ‘${dir}/${filename}.csv’”
done

echo “All .xlsx files in ‘$dir’ have been converted to CSV.”

Convert XLXS File to CSV in Bash Conclusion

What benefits does ChatGTP provide me with programming?

ChatGPT Code is Prestine

The code is formatted for you, it’s commented like a great programer would do. And it works. It evens tell you which variable to edit to enter actual values. Pretty amazing.

Provide A Learning Experience

By looking at the code used, I am learning. Learning about commands, technics, and logic flows ; something I would not have done right or spent long time researching. In the end ChatGTP benefits me but also my client by saving time for the implementation.

Give Advice On Which Tech To Use

Another great way to learn and become better at coding.

I think the best way to learn is to practice. Yes having some foundation is important, however, you can learn a lot by looking at others, how they do things. True I cannot ask ChatGPT why it decided to write the code this way, maybe I should try. But that’s for another day.

--

--

Data-Sleek

Data Sleek is a Data Consulting company that will help you maximize the value of your data and gives you actionable insights about your business.