How to recursively upload a folder to Box using Python

Rui Barbosa
Box Developer Blog
Published in
5 min readJul 5, 2023

--

Image by Freepik

In this article, learn how to recursively upload files to Box using Python, while taking advantage of Box API features such as pre-flight check, upload, update content, and chunked uploader.

This question from a developer on StackOverflow made me realize that there is no built in method on the Python SDK to upload the entire content of a folder. However the Box CLI does implement such a method, so I thought it would be an interesting exercise to do it in Python, hopefully the pythonic way.

Recursive uploads with Python? Sure, but with a couple of Box twists

Not only playing with recursion is fun, but the Box API also has some very useful features related with uploads:

  • folder.preflight_check will check if the file can actually be uploaded
  • folder.upload upload to a new file
  • file.update_contents upload to a new version of the file
  • folder.get_chunked_uploader creates a multithreaded upload to a new file
  • file.get_chunked_uploader creates a multithreaded upload into a new version

To learn more about the multiple ways you can upload content into Box, check out my previous article:

Upload vs Update

In order to distinguish if we need to upload or update a file, we do a preflight check to verify if the file can be accepted. If this call returns the error item_name_in_use , that means the file already exists and we perform an update instead of an upload.

Something like this:

def file_upload(client: Client, file_path: str, folder_id: str):
"""upload a file to box"""

file_size = os.path.getsize(file_path)
file_name = os.path.basename(file_path)

folder = get_folder_by_id(client, folder_id).get()
file_id = None

try:
folder.preflight_check(file_size, file_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
# file exists, let's get its id
file_id = err.context_info["conflicts"]["id"]
else:
raise err

if file_id is not None:
file = get_file_by_id(client, file_id)
# update the file (new version)
file = file.update_contents(file_path)
else:
# upload new file
file = client.folder(folder_id).upload(file_path, file_name)

To take it up to eleven, you could compare the Box SHA1 of the file with a local SHA1 and decide to execute or completely skip the file update.

Normal vs Chunked Upload

The Box chunked uploads, create multiple upload threads, and concurrently execute the upload.

The default number of threads can be set by the developer, and it defaults to 5.

The minimum size a file can be "chunked" uploaded is 20 megabytes.

This method would look like this:

def file_upload_chunked(
client: Client, file_path: str, folder_id: str, upload_threads: int = 5
):
"""upload a file to box"""

API.CHUNK_UPLOAD_THREADS = upload_threads

file_size = os.path.getsize(file_path)
file_name = os.path.basename(file_path)

folder = get_folder_by_id(client, folder_id)
file_id = None

try:
folder.preflight_check(file_size, file_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
file_id = err.context_info["conflicts"]["id"]
else:
raise err

if file_id is not None:
file = get_file_by_id(client, file_id)
# get a chunked uploader to update the file (new version)
uploader = file.get_chunked_uploader(file_path, file_name)
else:
# get a chunked uploader to upload a new file
uploader = folder.get_chunked_uploader(file_path, file_name)

try:
file = uploader.start()
except BoxAPIException as err:
if err.code == "upload_session_not_found":
file = uploader.resume()
else:
raise err

Again, we do a preflight_check to verify if the file already exists, and either upload or update the file. I will do the size check before calling this method.

What about duplicate folders?

Of course we also need to check if a folder exists at Box in order to decide if we create it before we transverse it.

def create_box_folder(
client: Client, folder_name: str, parent_folder: Folder
) -> Folder:
"""create a folder in box"""

try:
folder = parent_folder.create_subfolder(folder_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
folder_id = err.context_info["conflicts"][0]["id"]
folder = client.folder(folder_id).get()
else:
raise err

return folder

In the example above we always try to create the folder and check to see if the specific item_name_in_use error comes back. If so we get the folder object from the folder_id identified in the error. Otherwise the create folder method returns the newly created folder object.

Recursive uploads

Now we just need to put together a method to recursively upload a complete local folder:

def folder_upload(
client: Client,
box_base_folder: Folder,
local_folder_path: str,
min_file_size: int = 1024 * 1024 * 20,
) -> Folder:
"""upload a folder to box"""

local_folder = pathlib.Path(local_folder_path)

for item in local_folder.iterdir():
if item.is_dir():
new_box_folder = create_box_folder(client, item.name, box_base_folder)
print(f" Folder {item}")
folder_upload(client, new_box_folder, item)
else:
if item.stat().st_size < min_file_size:
file_upload(client, item, box_base_folder.id)
else:
file_upload_chunked(client, item, box_base_folder.id)
print(f" \tUploaded {item}")

return box_base_folder

Notice we are using pathlib to create an iterator that goes through the files and folder on the directory.

Then if it detects a folder, it creates it (or just gets the existing folder object) and calls itself recursively.

If it is a file, it checks its size to determine if it is eligible for a chunked or normal upload. We can set this value higher than 20 megabytes, but not lower.

Finally the specific upload method will figure out if the file needs to be uploaded (new file) or updated (new version).

Putting it all together

Here is an example of what a __main__ method could look like:

def main():
"""main app demo"""
settings = get_settings()

# check if sample folder exist and create them if not
sample_folder = check_sample_folders(settings.sample_folder_base_dir)

# get a client
service_client = box_client_get(settings.jwt_config_path)

# get a client as user
client = box_client_as_user_get(service_client, settings.as_user_id)

# create a demo upload folder in root if not exists
item = [
item
for item in client.folder("0").get_items()
if (item.name == settings.default_upload_folder
and item.type == "folder")
]
if len(item) == 0:
demo_folder = client.folder("0").create_subfolder(
settings.default_upload_folder
)
else:
demo_folder = item[0].get()

print("Box Python SDK - Upload Folder Demo")
print("=" * 40)
print(f" Uploading folder {sample_folder}")
print("-" * 40)
folder_upload(client, demo_folder, settings.sample_folder_base_dir)


if __name__ == "__main__":
main()
print("=" * 40)
print("All done")

Check out the full working example on this GitHub repo.

References

We've recently launched a brand new forum, come say hello and meet the rest of the team:

--

--