Box Skills + OpenAI API TL;DR Summarization Tutorial
In the past weeks since its release, OpenAI’s ChatGPT has taken the world by storm. The New York Times referred to the product as “the best artificial intelligence chatbot ever released to the general public.”
You can try out the service here or see the example below of the service describing quantum computing. Pretty cool huh?
ChatGPT uses OpenAI’s API to craft responses. According to their documentation, “The OpenAI API can be applied to virtually any task that involves understanding or generating natural language or code.” Today, I’m going to create TL;DR (too long didn’t read) summarizations of content uploaded to Box — the leading content cloud for document management, collaboration and e-signature.
I’m sure you have experienced that dread when you are given a long document to read or presentation to click through — with the help of Box Skills and OpenAI, that issue can be solved quickly.
If you are asking yourself, what are Box Skills, head on over to my quick start tutorial on the topic here, but the TL;DR (see what I did there?) is that Box Skills provides developers a framework to store machine learning and artificial intelligence data as metadata directly on the content in Box.
This is the end result we are going to get to. As shown below, a summary of a five paragraph essay appears in the Box Skills tab on the right — giving you a clear picture of what the document is about without having to read the entire essay. The Skill can be used on any content types with text: docx, pdf, pptx, etc.
And, here is a diagram of the solution.
Whenever content is uploaded, moved or copied into the folder configured for the Box Skill, a serverless function’s invocation url is called. This triggers a block of code to run that extracts the text from the file, sends that text to the OpenAI API, and writes the summary data back onto the file’s Skill card in Box. In this example, I’m going to use Google Cloud Platform to host my serverless function, but any cloud service will work.
Let’s jump into the tutorial!
Setup Box Skill
Create a Box developer account (optional but recommended)
If you don’t have a Box enterprise account, you can sign up for a free developer account here. I recommend using the developer account for the tutorial instead of using your production environment.
Please note that you cannot use the same email address during sign up due to the restriction of having a unique email address across all of Box.
Create the Box Skill
Navigate to the Developer Console, and click Create New App.
Select Box Custom Skill.
Give the application a name, and click Create App.
After creating the application, you will see the below screen. The red box is where you will put the URL where you would like the Box Skills payload to go. We will add this URL later on.
In the security keys tab, you will find two keys that can be used to verify that Box is the service that called the serverless function.
Since this Box Skill will be used for content that has text, add in the file extension types seen below. NOTE — This may need to be done at the end of the tutorial.
Enable/Authorize the Box Skill (completed by admin)
Just like other application types, an administrator of your Box instance will need to enable and authorize the Box Skill in the skills section of the admin console. You will need to provide the admin with the client id of the application, which is found in the Box Skill configuration screen. If you set up a developer account at the beginning of this tutorial, the user account created will be the administrator of the instance.
You will also need to provide the folder name(s)/owner of the content you wish the Skill to be triggered. If you haven’t set up a folder for the Skill to monitor yet, you will want to do that prior to requesting authorization from your admin.
On the Skills Admin Console screen, click Add Skill.
Enter the Client ID of the Skill, and click Next.
Select whether the skill should run for all content or a subset of folders.
For (a) specific folder(s), filter the pop-up by user and folder name. Check the folder(s) for which the Skill should be triggered.
Confirm selections, and click Enable.
Create an OpenAI account
Go to the OpenAI API developer site and sign up for a new account — if you don’t already have one.
You can view various code samples in multiple languages under the examples tab.
Create an API key
Under you account, click View API keys.
Click create new secret key, and copy/save the value for use later on. You will not be able to see the key again once you close the popup.
Setup GCP Account
This tutorial will be using the awesome Serverless Framework to deploy our code.
Before continuing, you will need to setup a GCP account with a billing method attached. I won’t be going over all of those steps here, but you can find the steps for that on the Serverless website. Make sure to complete all the steps including creating a project + enabling the APIs, creating a service account and downloading a JSON key file.
Save the JSON key file for the next step.
Deploy GCP Function
Download or fork the code for the GCP function to your computer wherever you place your typical code projects.
Open the project in a code editor like Visual Studio Code.
In the editor’s terminal, confirm you have Node v10.0.0 or higher.
Drag the JSON key file that was downloaded from the step above into the .gcloud folder and rename it serverless.json.
Update the serverless.yml file to have the configuration and naming information for your GCP account. Also, make sure to add the Box security keys shown in the screenshot below.
The Box primary and secondary keys come from the developer console’s security keys section I mentioned a few sections up. It is important to use these keys to make sure only Box can run the serverless function’s code.
In the VS Code terminal, run
npm install followed by running
sls deploy. Deployment can take several minutes, especially the first time. After it completes, you will get back an invocation URL. Copy and paste that into the correct field under the configuration tab for the application.
Visit the GCP console to see that your serverless function is active. You also need to add an additional permission to the function so that Box can call it. Click permissions > add. Type “allUsers” in the new principals box with a role selected of Cloud Functions Invoker. Click Save.
Upload a file to the Box folder configured by the administrator. Open the file in Box to see the TL;DR Summary added to a Box Skills transcript card.
You can also check the logs in the serverless function for verification that the process completed successfully.
In the above tutorial, I just show deploying the code as written — but if you are interested in the text extraction, tokenization, or customization technical details, feel free to keep reading below about how the GCP function works.
For reference, this is the index.js file.
The OpenAI API cannot accept a file directly, and as such requires text to be sent in with the request. Thankfully, all content with text automatically has a text representation created once uploaded to Box. This is why at the beginning we setup extension restrictions for the Box Skill.
Please note that Box will only create the text representation for files like Word, Powerpoint, etc. So, photo or video files without text will not have this representation created.
In the code, there is a 5 second buffer, since the representation sometimes takes a few moments to generate — especially for longer text. If for some reason the text representation is still not ready, the code returns a 400, which will then trigger the Box Skill automatic exponential backoff retry process.
OpenAI operates and bills based on tokens. You can read more about them in their documentation, but essentially 1 token = approximately 4 characters. They have a tool that can be used to count the tokens of a block of text.
Tokens are important because each of their models has a token limit. In the code, the text-davinci-003 is used. It is the most expensive but performs the best in actually creating a good summary. Also, it has the highest token limit — 4000.
The text from the request plus the text sent back from the service must be below the limit.
In this example, the code uses their GPT-3 encoder npm package to find out how many tokens the text being sent in the request equates to. If the tokens equals more than 3800, the text sent in is cut off at the 3800 mark.
3800 was chosen, because this seemed to perform the best when sending in large blocks of text. As the OpenAI API improves, it is presumed the limit will increase, allowing for longer text sent in and fuller summaries to be returned.
Box Skills Card Name Customization
By default, the Box Skills kit transcript card title says “Transcript.” I simply edited line 371 in the skills-kit-2.0.js file to have the card to say “TL;DR Summary”.
Thanks for checking out my tutorial using Box Skills with OpenAI. Stay tuned for more Skills content coming soon!
…And as always, feel free to post questions on our developer forum.