3 Ways to upload content to Box using Python

Rui Barbosa
Box Developer Blog
Published in
4 min readMay 30, 2023
Image by Freepik

In this article, we’ll discuss three ways that developers can use to upload content to the Box cloud. These include the normal upload, chunked upload, and manual upload. Each method has its own benefits and use cases, and we’ll explore how to implement them using the Box Python SDK. By the end of this article, you’ll have a better understanding of the different upload methods available to you and which one may be the best fit for your specific needs.

Normal Upload

Uploading a file using the Box Python SDK can be as easy as:

uploaded_file = client.folder(folder_id).upload(file_path, file_name)

Under the hood the SDK is opening the file in read binary and using the upload_stream method, which you can also use:

with open(file_path,'rb') as file_stream:
uploade_file = client.folder(folder_id).upload_stream(
file_stream,
file_name
)

As you can see we are using the upload methods of the folder object. This implies you want to upload a new file to the specified folder.

However if you try to do this as the file already exists, i.e., a file object with the same name on the same folder, then you get an error.

You should perform a preflight check, before you upload the file to see if it will be accepted:

    file_size = os.path.getsize(file_path)
file_name = os.path.basename(file_path)

folder = get_folder_by_id(client, folder_id).get()
file_id = None
try:
folder.preflight_check(file_size, file_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
file_id = err.context_info["conflicts"]["id"]
else:
raise err

The sample code above returns an exception, letting you know there is a conflict. The file_id of the conflicting file is also included.

New file vs New version

If a file already exists you can choose to upload a new version, using the file.update_contents. So continuing the example from above, you can automatically choose either to upload a new file or upload a new version if the file already exists:

### continued from above
if file_id is not None:
file = get_file_by_id(client, file_id)
file = file.update_contents(file_path)
else:
file = client.folder(folder_id).upload(file_path, file_name)

This theme is present on the rest of the update flavors. You upload to a folder to create a new file, and update the content of a file to upload a new version.

Chunked Upload

The chunked upload takes advantage of multi-threading to upload several parts of the file simultaneously.

Our platform defaults to using 5 threads for file uploads. To ensure optimal upload speeds, we recommend using chunked upload for files larger than 50 megabytes. The minimum file size to use chunked uploads is 20 megabytes.

This approach will help you efficiently upload large files without affecting the performance of your application.

More threads does not necessarily mean faster, it depends on your system, connection, and file size.

We'll set the threads to 3. We are also doing a preflight check to see if the file can be accepted:

    API.CHUNK_UPLOAD_THREADS = upload_threads

file_size = os.path.getsize(file_path)
file_name = os.path.basename(file_path)

folder = get_folder_by_id(client, folder_id)
file_id = None

try:
folder.preflight_check(file_size, file_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
file_id = err.context_info["conflicts"]["id"]
else:
raise err

if file_id is not None:
file = get_file_by_id(client, file_id)
uploader = file.get_chunked_uploader(file_path, file_name)
else:
uploader = folder.get_chunked_uploader(file_path, file_name)

Notice the get_chunked_uploader method exists for both folders and files, and it uploads a new file or uploads a new version respectively.

If an error occurs during the upload, the ChunkedUploader object has the .resume() and .abort() methods you can use.

Continuing our example from above:

    ### continued from above
try:
file = uploader.start()
except BoxAPIException as err:
file = uploader.resume()

Manual Upload

The manual upload gives full control over the content upload. In essence you create an upload sessions, upload the content in chunks, and then commit the session to get your file object.

It also supports the preflight_check method:

    file_size = os.path.getsize(file_path)
file_name = os.path.basename(file_path)

folder = get_folder_by_id(client, folder_id)
file_id = None

try:
folder.preflight_check(file_size, file_name)
except BoxAPIException as err:
if err.code == "item_name_in_use":
file_id = err.context_info["conflicts"]["id"]
else:
raise err

if file_id is not None:
file = get_file_by_id(client, file_id)
upload_session = client.file(file_id).create_upload_session(file_size, file_name)
else:
upload_session = client.folder(folder_id).create_upload_session(file_size, file_name)

The UploadSession object will automatically calculate the number and size of the parts to be uploaded. So we just use this to break up our file in chunks, and upload them.

    file_sha1 = hashlib.sha1()
parts = []

with open(file_path, "rb") as file_stream:
for part_num in range(upload_session.total_parts):
copied_length = 0
chunk = b""
while copied_length < upload_session.part_size:
bytes_read = file_stream.read(upload_session.part_size - copied_length)
if bytes_read is None:
# stream returns none when no bytes are ready currently
# but there are potentially more bytes in the stream
# to be read.
continue
if len(bytes_read) == 0:
# stream is exhausted.
break
chunk += bytes_read
copied_length += len(bytes_read)

uploaded_part = upload_session.upload_part_bytes(
chunk, part_num * upload_session.part_size, file_size
)
parts.append(uploaded_part)
file_sha1.update(chunk)

Once all the parts are uploaded we can commit the session, and it will return the file object uploaded.

Notice we need to keep track of the parts, and we also need to send the SHA1 of the complete file. Box validates if SHA1 matches what is has received:

content_sha1 = file_sha1.digest()
uploaded_file = upload_session.commit(content_sha1=content_sha1, parts=parts)

We've created a simple demo of 3 ways to upload content into Box, you can checkout the full working sample on this GitHub Repo.

The output of the script:

Box Python SDK - Upload Demo
========================================
Normal upload sample_files/file-1MB.bin in 2.1 seconds.
----------------------------------------
Normal upload sample_files/file-20MB.bin in 11.9 seconds.
Chunked upload sample_files/file-20MB.bin, 2 threads in 13.9 seconds.
----------------------------------------
Normal upload sample_files/file-100MB.bin in 26.2 seconds.
Chunked upload sample_files/file-100MB.bin, 5 threads in 32.9 seconds.
----------------------------------------
Manual upload sample_files/file-100MB.bin
Uploading [#############] in 49.9 seconds.
----------------------------------------
Manual upload sample_files/file-500MB.bin
Uploading [################] in 181.9 seconds.
========================================
All done

References

--

--