Reading CSV file from amazon S3 bucket using csv module in Python

Lakshmi K
2 min readFeb 21, 2022

--

Sometimes we may need to read a csv file from amzon s3 bucket directly , we can achieve this by using several methods, in that most common way is by using csv module.

import csv at the top of python file

import csv

then the function and code looks like this

s3 = boto3.client(
's3',
aws_access_key_id='XYZACCESSKEY',
aws_secret_access_key='XYZSECRETKEY',
region_name='us-east-1'
) #1

obj = s3.get_object(Bucket='bucket-name', Key='myreadcsvfile.csv') #2
data = obj['Body'].read().decode('utf-8').splitlines() #3
records = csv.reader(data) #4
headers = next(records) #5
print('headers: %s' % (headers))
for eachRecord in records: #6
print(eachRecord)

#1 — creating an object for s3 client with s3 access key , secret key and region (just assuming , reader already know what is access key and secret key.)

#2 — getting an object for our bucket name along with the file name of csv file.

In some cases we may not have csv file directly in s3 bucket , we may have folders and inside folders to get csv file , at that scenario the #2 line should change like below

obj = s3.get_object(Bucket='bucket-name', Key='folder/subfoler/myreadcsvfile.csv')

#3 — with the second line we got hand on object of csv file , now we need to read it , and the data will be in binary format so we are using decode() function to convert it into readable format.

then we are using splitlines() function to split each row as one record

#4 — now we are using csv.reader(data) to read the above data from line #3

with this we almost got the data , we just need to seperate headers and actual data

#5 — with this we will get all the headers of that entire csv file.

#6 — by using for loop , we are iterating through each record and printing each row of the csv files.

After getting the data we don’t want the data and headers to be in separate places , we want combined data saying which value belongs to which header. Let’s do it now , take one array variable before for loop

csvData = []
headerCount = len(headers)

and change the for loop like this

for eachRecord in records:
tmp = {}
for count in range(0,headerCount):
tmp[headers[count]] = line[count]
csvData.append(tmp)
print(csvData)

now csvData contains the data in the below for

[{‘id’: ‘1’, ‘name’: ‘Jack’,‘age’: ‘24’},{‘id’: ‘2’, ‘name’: ‘Stark’,‘age’: ‘29’}]

Note: I formatted data in this format as it is my requirement , based on one’s requirement formatting data can be changed.

Hope this helped!, Happy Coding and Reading

--

--