Microsoft SharePoint Data Transfer Using Python

Warda Rahim
4 min readDec 2, 2022

--

Data scientists work in close collaboration with people across a range of departments in a company. Often stakeholders are a non-technical audience. Therefore, onboarding them onto a cloud platform and making them go through your Jupyter Notebooks does not make sense. This is the reason the output is often shared in the form of dashboards or pdf documents emailed to stakeholders. Now imagine a scenario where stakeholders provide data scientists with a bunch of csv files on a weekly basis which the data scientists then clean, preprocess and integrate into their workflow. This would involve a lot of manual work where the files are being sent over email, downloaded by data scientists, uploaded to their cloud platform and then used there. If, however, Microsoft SharePoint is a major piece of collaboration within the organisation, the stakeholders can upload their files there, which can then be read directly using Python and integrated into an automated ETL pipeline.

In this article, we will see how we can use Office365-REST-Python-Client library to transfer data directly to and from Microsoft SharePoint Online. The client implements the Sharepoint REST API.

Installation

It can be installed using:

pip install Office365-REST-Python-Client

Authentication

Microsoft SharePoint Online source connection uses app-based authentication (Client ID and Client Secret) instead of user-based authentication (Username and Password). Follow the instructions here to create app-based credentials for SharePoint Online.

For authentication, we would then use our access credentials (CLIENT_ID and CLIENT_SECRET) which can be stored in a .env file at project level.

CLIENT_ID = “<CLIENTID>”
CLIENT_SECRET = “<CLIENT_SECRET>”
SITE_URL = “https://<CompanyUrl>.sharepoint.com/sites/<site-name>”
from office365.sharepoint.client_context import ClientContext
from office365.runtime.auth.client_credential import ClientCredential

context = ClientContext(SITE_URL).with_credentials(
ClientCredential(CLIENT_ID, CLIENT_SECRET)
)

The ClientContext approach would authenticate you on your SharePoint server.

Data Transfer Using Python

After authentication, you can upload or download a file directly to and from SharePoint.

Download a file

  • To download a file from SharePoint, you can use:
with open(download_path, "wb") as local_file: 
myfile = (context.web.get_file_by_server_relative_path(remote_path)
.download(local_file)
.execute_query()
)

where remote_path is SharePoint URL of the file you want to download, for example “/sites/team/Shared Documents/data/refreshed_data.csv”. And download_path is the path on your machine where you want to download the file.

Upload a file

  • Similarly, you can upload a file to SharePoint Online from your local directory using:
target_folder = context.web.get_folder_by_server_relative_url(
f"Shared Documents/{dir_name}"
)

with open(local_file_path, "rb") as content_file:
file_content = content_file.read()
target_folder.upload_file(os.path.basename(local_file_path), file_content).execute_query()

dir_name is the name of the SharePoint folder where you want to upload your file. And local_file_path is the path of the file you want to upload to SharePoint.

Read a File

Instead of downloading and uploading files from and to SharePoint, you can read and write files directly as well.

  • The code below will read a file directly from SharePoint and create a pandas DataFrame:
from office365.sharepoint.files.file import File
import pandas as pd
import io
import errno

response = File.open_binary(context, '/'.join([folder_relative_path, file_name]))
df = pd.read_csv(io.BytesIO(response.content))
if df.shape[0] == 0:
raise FileNotFoundError(f'Either {file_name} does not exist or path to the file is wrong.')
else:
return df

Here we are downloading the file to memory and then using pandas to read it. folder_relative_path is the SharePoint folder relative path, for example “/sites/team/Shared Documents/data”. And file_name is the name of the SharePoint file you want to read. The above code without if/else statements will not output an error if the file does not exist.

Write to SharePoint

  • Similarly, you can write a pandas DataFrame directly to SharePoint using:
import pandas as pd
import io

target_folder = context.web.get_folder_by_server_relative_url(
f"Shared Documents/{dir_name}"
)

buffer = io.BytesIO()
df.to_csv(buffer, index=False)
buffer.seek(0)
file_content = buffer.read()
target_folder.upload_file(file_name, file_content).execute_query()

Here we first create a buffer object and then write the DataFrame directly to the buffer. The seventh line (buffer.seek(0))is to go to the start of the stream. This step is to seek back to the start of the file after writing the initial in memory file. Finally, we read the data in the buffer object and upload it to a target folder on SharePoint. dir_name is the SharePoint folder name where the data is uploaded, df is the pandas DataFrame that you want to write to SharePoint, and file_name is the SharePoint filename with the uploaded data.

The article mentioned some of the functions that you can use to transfer data directly to SharePoint Online. You can find the code here as well.

References

Hey 👋 if you found this article useful, please support by buying me a coffee here. Thank you 😀

--

--