Upload the files into s3 with relative paths using python, boto3, and the pathlib module

nischal shakya
2 min readMar 27, 2023

--

Let’s say we have directories and subdirectories as shown in the diagram below.

Directories and subdirectories of s3 folder.

There might arise the case when all the directories, and subdirectories including the files have to be uploaded into the s3 bucket preserving the relative path. This can be easily done using boto3 and pathlib module for python. boto3 is the amazon s3 bucket SDK and pathlib is the module provided by python.

Prerequisites

  • Install and configure the aws cli.
  • Install the boto3

It can be divided into 4 steps.

  • Read all the directories from the path from where files, directories, and subdirectories will be uploaded into the s3 bucket. Let’s say the directory_path is from /home/admin/Downloads/s3
def read_files_from_location():
file_paths = Path('/home/adira/Downloads/s3')
return list([entry.name for entry in file_paths.iterdir() if entry.is_dir()])
  • Use the dictionary object to store the folder name as the key and the directory and its files, and subdirectories metadata as the list of values.
def map_path_to_file(directories):
file_dicts = {}
for d in directories:
file_paths = Path("".join(['/home/admin/Downloads/s3', f'/{d}'])).rglob("*")

for file_path in file_paths:
if file_path.is_file():
full_file_name, file_name, file_extension = file_path.name, file_path.stem, file_path.suffix
file_path, parent_directory = str(file_path.parent), file_path.parent.name
content_type = mimetypes.guess_type(full_file_name)[0]
file_size = os.path.getsize("".join([file_path, '/', full_file_name]))
posix_path = pathlib.PosixPath(file_path)
relative_path = str(posix_path.relative_to('/home/admin/Downloads/s3'))

files_arr = []
file_metadata_dict = {
'full_file_name': full_file_name,
'file_name': file_name,
'file_extension': file_extension,
'absolute_path': file_path,
'parent_directory': parent_directory,
'content_type': content_type,
'file_size': file_size,
'relative_path': relative_path
}

files_arr.append(file_metadata_dict)

if d in file_dicts:
file_dicts[d].append(file_metadata_dict)
else:
file_dicts[d] = files_arr

return file_dicts
  • Iterate through the dictionary, open the file in binary mode, provide the source and destination path, and upload to the s3 bucket.
def upload_to_s3(file_dicts):
s3_client = boto3.client('s3', aws_access_key_id='{aws_access_key_id}',
aws_secret_access_key='{aws_secret_access_key}')

for file_dict in file_dicts:
for file in file_dicts.get(file_dict):
file_name, relative_path, absolute_path = file['full_file_name'], file['relative_path'], \
file['absolute_path']
source_path = ''.join([absolute_path, '/', file_name])
s3_destination_path = ''.join([relative_path, '/', file_name])
try:
with open(source_path, 'rb') as f:
s3_client.upload_fileobj(f, '{bucket_name}', s3_destination_path)
except (ClientError, FileNotFoundError) as e:
print(e)
  • Execute the method in sequence.
directory = read_files_from_location()
file_dicts_response = map_path_to_file(directory)
upload_to_s3(file_dicts_response)
  • Verify the output using the aws cli command.
aws s3 ls s3://{bucketname} --recursive --human-readable --summarize
  • The directory, subdirectories, and files should be uploaded into the aws s3 bucket as shown in the figure below.

That’s it. Happy learning.

--

--