How to recursively upload a folder to Box using Python
In this article, learn how to recursively upload files to Box using Python, while taking advantage of Box API features such as pre-flight check, upload, update content, and chunked uploader.
This question from a developer on StackOverflow made me realize that there is no built in method on the Python SDK to upload the entire content of a folder. However the Box CLI does implement such a method, so I thought it would be an interesting exercise to do it in Python, hopefully the pythonic way.
Recursive uploads with Python? Sure, but with a couple of Box twists
Not only playing with recursion is fun, but the Box API also has some very useful features related with uploads:
folder.preflight_check
will check if the file can actually be uploadedfolder.upload
upload to a new filefile.update_contents
upload to a new version of the filefolder.get_chunked_uploader
creates a multithreaded upload to a new filefile.get_chunked_uploader
creates a multithreaded upload into a new version
To learn more about the multiple ways you can upload content into Box, check out my previous article:
Upload vs Update
In order to distinguish if we need to upload or update a file, we do a preflight check to verify if the file can be accepted. If this call returns the error item_name_in_use
, that means the file already exists and we perform an update instead of an upload.
Something like this:
def file_upload(client: Client, file_path: str, folder_id: str):
"""upload a file to box"""
file_size = os.path.getsize(file_path)
file_name = os.path.basename(file_path)
folder = get_folder_by_id(client, folder_id).get()
file_id = None
try:
folder.preflight_check(file_size, file_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
# file exists, let's get its id
file_id = err.context_info["conflicts"]["id"]
else:
raise err
if file_id is not None:
file = get_file_by_id(client, file_id)
# update the file (new version)
file = file.update_contents(file_path)
else:
# upload new file
file = client.folder(folder_id).upload(file_path, file_name)
To take it up to eleven, you could compare the Box SHA1 of the file with a local SHA1 and decide to execute or completely skip the file update.
Normal vs Chunked Upload
The Box chunked uploads, create multiple upload threads, and concurrently execute the upload.
The default number of threads can be set by the developer, and it defaults to 5.
The minimum size a file can be "chunked" uploaded is 20 megabytes.
This method would look like this:
def file_upload_chunked(
client: Client, file_path: str, folder_id: str, upload_threads: int = 5
):
"""upload a file to box"""
API.CHUNK_UPLOAD_THREADS = upload_threads
file_size = os.path.getsize(file_path)
file_name = os.path.basename(file_path)
folder = get_folder_by_id(client, folder_id)
file_id = None
try:
folder.preflight_check(file_size, file_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
file_id = err.context_info["conflicts"]["id"]
else:
raise err
if file_id is not None:
file = get_file_by_id(client, file_id)
# get a chunked uploader to update the file (new version)
uploader = file.get_chunked_uploader(file_path, file_name)
else:
# get a chunked uploader to upload a new file
uploader = folder.get_chunked_uploader(file_path, file_name)
try:
file = uploader.start()
except BoxAPIException as err:
if err.code == "upload_session_not_found":
file = uploader.resume()
else:
raise err
Again, we do a preflight_check
to verify if the file already exists, and either upload or update the file. I will do the size check before calling this method.
What about duplicate folders?
Of course we also need to check if a folder exists at Box in order to decide if we create it before we transverse it.
def create_box_folder(
client: Client, folder_name: str, parent_folder: Folder
) -> Folder:
"""create a folder in box"""
try:
folder = parent_folder.create_subfolder(folder_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
folder_id = err.context_info["conflicts"][0]["id"]
folder = client.folder(folder_id).get()
else:
raise err
return folder
In the example above we always try to create the folder and check to see if the specific item_name_in_use
error comes back. If so we get the folder object from the folder_id
identified in the error. Otherwise the create folder method returns the newly created folder object.
Recursive uploads
Now we just need to put together a method to recursively upload a complete local folder:
def folder_upload(
client: Client,
box_base_folder: Folder,
local_folder_path: str,
min_file_size: int = 1024 * 1024 * 20,
) -> Folder:
"""upload a folder to box"""
local_folder = pathlib.Path(local_folder_path)
for item in local_folder.iterdir():
if item.is_dir():
new_box_folder = create_box_folder(client, item.name, box_base_folder)
print(f" Folder {item}")
folder_upload(client, new_box_folder, item)
else:
if item.stat().st_size < min_file_size:
file_upload(client, item, box_base_folder.id)
else:
file_upload_chunked(client, item, box_base_folder.id)
print(f" \tUploaded {item}")
return box_base_folder
Notice we are using pathlib
to create an iterator that goes through the files and folder on the directory.
Then if it detects a folder, it creates it (or just gets the existing folder object) and calls itself recursively.
If it is a file, it checks its size to determine if it is eligible for a chunked or normal upload. We can set this value higher than 20 megabytes, but not lower.
Finally the specific upload method will figure out if the file needs to be uploaded (new file) or updated (new version).
Putting it all together
Here is an example of what a __main__
method could look like:
def main():
"""main app demo"""
settings = get_settings()
# check if sample folder exist and create them if not
sample_folder = check_sample_folders(settings.sample_folder_base_dir)
# get a client
service_client = box_client_get(settings.jwt_config_path)
# get a client as user
client = box_client_as_user_get(service_client, settings.as_user_id)
# create a demo upload folder in root if not exists
item = [
item
for item in client.folder("0").get_items()
if (item.name == settings.default_upload_folder
and item.type == "folder")
]
if len(item) == 0:
demo_folder = client.folder("0").create_subfolder(
settings.default_upload_folder
)
else:
demo_folder = item[0].get()
print("Box Python SDK - Upload Folder Demo")
print("=" * 40)
print(f" Uploading folder {sample_folder}")
print("-" * 40)
folder_upload(client, demo_folder, settings.sample_folder_base_dir)
if __name__ == "__main__":
main()
print("=" * 40)
print("All done")
Check out the full working example on this GitHub repo.
References
- 3 Ways to upload content to Box using Python
- Box upload guide
- Box Python SDK: Uploading a file
- Box API: Uploads
We've recently launched a brand new forum, come say hello and meet the rest of the team: