Box Skills: Pieces of the Puzzle + Tutorial
Box Skills were introduced in 2017, and since that time, over 1000 skills have been used across various industries to bring intelligence to Box content. Just how easy is it to setup a skill? In this blog, I will show you!
Box Skills Refresher
Before diving into how to set up a Box Skill, I want to give an overview on what they are and what parts complete a Box Skills implementation.
If you refer to the architecture patterns section of our developer site, the below diagram appears. It represents a classic example of a Box Skill — in this case, a floorplan identification solution.
- The flow is triggered from a floorplan file being uploaded, moved, or copied into the folder configured for the Box Skill.
- The file is sent to a Google Cloud Function via the invocation URL set up in the Box Skill configuration steps. Any cloud provider who offers serverless functions can be used, even though we show GCP in this tutorial.
- The function runs custom written code to parse the Box Skill payload, verify security keys, download a local copy of the file, send the file to a machine learning provider — also GCP in this example, and write metadata discovered by the machine learning service back to the original file in Box.
It is important to note that most of the above pieces don’t actually run on Box infrastructure. Referring to the above diagram, everything in the first dotted box lives in Box: the content, the metadata template, and configuration information. The logic or processing of the Skill happens in the second dotted box. The machine learning occurs in the third dotted box.
The second and third parts aren’t licensed or billed through Box Platform, and as such, typically require working through that particular services’ sales and enablement teams— they do not have to be the same provider. For example, you can have an Azure Function call IBM Watson.
Box provides value in the flow, because the content’s initial action triggers the process to begin. This keeps developers from migrating or duplicating content to collect insight. Also, the flow ends with searchable metadata directly within the platform in seconds, allowing users to see the information they need faster.
In the below demo, I’m going to show you how to set up the flow in the diagram above. At the end, I hope to be able to upload a floorplan and be able to have the number of bedrooms in the floorplan written back as metadata on the file. That way I can search for all floorplans with a certain number of bedroom in the future.
Let’s jump into the tutorial!
Box Skills Administrative Setup
Create a Box developer account (optional but recommended)
If you don’t have a Box enterprise account, you can sign up for a free developer account here. I recommend using the developer account for the tutorial instead of using your production environment.
Please note that you cannot use the same email address during sign up due to the restriction of having a unique email address across all of Box.
Create the Box Skill (completed by developer)
Navigate to the Developer Console, and click Create New App.
Select Box Custom Skill.
Give the application a name, and click Create App.
After creating the application, you will see the below screen. The red box is where you will put the URL where you would like the Box Skills payload to go. We will add this URL later on.
In the security keys tab, you will find two keys that can be used to verify that Box is the service that called the serverless function.
Enable/Authorize the Box Skill (completed by admin)
Just like other application types, an administrator of your Box instance will need to enable and authorize the Box Skill in the skills section of the admin console. You will need to provide the admin with the client id of the application, which is found in the Box Skill configuration screen.
You will also need to provide the folder name(s)/owner of the content you wish the Skill to be triggered. If you haven’t set up a folder for the Skill to monitor yet, you will want to do that prior to requesting authorization from your admin.
On the Skills Admin Console screen, click Add Skill.
Enter the Client ID of the Skill, and click Next.
Select whether the skill should run for all content or a subset of folders.
For (a) specific folder(s), filter the pop-up by user or folder name. Check the folder(s) for which the Skill should be triggered.
Confirm selections, and click Enable.
Create a Box Metadata Template (optional — completed by admin)
This step is optional, since metadata or skills data can be written to a file without a template; however, creating a metadata template will allow users to easily search by the results machine learning provided.
In the Admin Console, click Content > Metadata > Create New.
Configure the metadata template by giving it a name, adding attributes — which should each have a name, format, and description, and clicking Save.
Serverless Function Setup
This example will use our Serverless Box Custom Skills starter repository to speed up development.
Click the above link and download the repository to your computer.
Unzip the downloaded folder. Find the box-custom-skills-starter-gcp folder within the downloaded repository and rename it to something that matches your use case. Also, feel free to move the folder to your typical projects directory. If you would rather use AWS or Azure, you will find hello world directories in the same zip folder.
Open the folder in a code editor like Visual Studio Code.
In the editor’s terminal, confirm you have Node v10.0.0 or higher.
Before continuing, you will need to setup a GCP account with a billing method attached. I won’t be going over all of those steps here, but you can find the steps for that on the Serverless website. Make sure to complete all the steps including creating a project + enabling the APIs, creating a service account and downloading a JSON key file.
Drag the JSON key file that was downloaded from the step above into the .gcloud folder.
Rename the file as serverless.json.
Update the package.json file to have the naming information for your use case.
Update the serverless.yml file to have the configuration and naming information for your GCP account and use case. Also, make sure to add the Box security keys shown in the screens below.
The Box primary and secondary keys come from the developer console’s security keys section I mentioned a few sections up. It is important to use these keys to make sure only Box can run the serverless function’s code.
In a terminal, run npm install
.
Then, run npx serverless deploy
.
Deployment can take several minutes, especially the first time. After it completes, you will get back an invocation URL.
Copy and paste that into the developer console application you created earlier.
Visit the GCP console to see that your serverless function is active.
You also need to add an additional permission to the function so that Box can call it. Click permissions > add.
Type “allUsers” in the new principals box with a role selected of Cloud Functions Invoker. Click Save.
Upload a file to the Box folder configured by the administrator.
Open the file in Box to see the message “Hello World” attached as metadata.
You can also check the logs in the serverless function for verification.
GCP Document AI Setup
Now that the Box Skill and serverless function are working, the last piece is setting up a machine learning provider and editing our code to use it. Google’s Document AI is perfect for this use case, because it will pull OCR text from the floorplans uploaded without needing to train a custom machine learning model. We can send the text it finds back to the serverless function setup above, parse the results, and apply it back to the file as searchable metadata.
Before jumping into the setup steps, visit the try-it section to see the typical response you will receive.
Notice the OCR text and JSON tabs. In the JSON response you can see a text field. This is where the OCR text data is placed for you to parse.
We are going to use the how to guides section of the Document AI’s documentation to configure the api for the GCP project created earlier. Be aware that we don’t need to setup another service account or download a new JSON key file, since we already did that in the previous blog post.
Enable the Document AI by clicking here and turning it on for the GCP project we set up earlier in this tutorial.
Go to the processors page and click Explore Processors.
Select the document OCR processor type.
Give the processor a name in the side pop-up window and click Create.
Find the prediction endpoint for the processor. This is what we will use to get information about the floorplans once we’ve edited our code.
Now, we are going to edit the code to use this new api.
In the terminal, run the following:
npm install axios @google-cloud/documentai
In the serverless.yml file, we need to add some more variables: gcp_project_id
, gcp_location
, gcp_processor_id
, box_metadata_key
, box_api_endpoint
, and bedroom_list
. You can find the new section to copy below.
The project id, location, and processor id come from the GCP configuration we did earlier. Adding them as variables now allows us to access this in the code and change them in the future without code editing.
The metadata key is the key for the metadata template we setup earlier. You can find this in the admin console under content > metadata.
The bedroom list is a comma separated string of all the names our floorplans call bedrooms. This will help us parse the OCR test into a searchable integer. For this and any other lists — bathrooms, porches, etc — there can be no spaces between the terms.
These will all make more sense once we change the index.js file.
For the index.js, we need to make several changes. To make it easier, there is a full example of the finished index.js file in the repo you downloaded earlier that you can copy from. It is called index_reference.js. Keep in mind that this is just an example. You can use this to expand functionality depending on your real world use case. You will see comments above the code below describing what each section does.
Simply copy/paste the index_reference.js file and replace the contents of the index.js file.
Now, you can run npx serverless deploy
and then we can use the skill!
Using the Box Skill
At this point, all of the pieces for a Box Skill to work have been set up: Box Skill creation, Box Skill authorization, metadata template creation, and the serverless function/machine learning setup. We can now use the skill for our actual use case!
Upload a file to the Box folder configured by the administrator.
Click into a file uploaded and see that metadata has been applied to a file.
Box takes approximately ten minutes to index metadata. After that, the metadata can be used as search criteria for content within the Box web application.
Click the search bar > Metadata > Pick a template from the dropdown > type in search criteria > hit Enter.
Any files that match the criteria will be returned.
Box Skills Tutorial Complete
In this blog post, I’ve showed how you can use some simple boilerplate code and an out of the box machine learning API to gather and use valuable data in the content stored on Box.
In this tutorial we used floorplans, but you could use any document you wanted — resumes, invoices, applications, etc. The possibilities are endless due to the many ready to use machine learning APIs out there. Plus — as you can see from the tutorial, setting up the flow is not that hard!
Huge thank you to Marley!
We hope you enjoyed this tutorial, and feel free to reach out to us on the developer forum for support, or via Box Pulse to make suggestions on how to improve Box Skills.