Introducing the Box Skills Kit

Box Developers
Box Developer Blog
Published in
5 min readOct 11, 2017

--

Today at BoxWorks 2017, we introduced Box Skills, a framework for bringing machine learning to your content in Box. Box Skills leverage powerful technologies from providers like Google Cloud, Microsoft Azure and IBM Watson to process and extract valuable insights from files stored in Box.

You can use the Box Skills Kit to build Custom Skills for Box. Custom skills enable you to leverage machine learning technologies to work with content and processes unique to your business. You can start building a Custom Skill here.

The Image Intelligence Skill will automatically label images and extract any text to place as metadata on the file in Box

What is a Skill?

Box Skills are functions that take a file in Box, pass it to a machine learning provider for processing, and then structure the output from the machine learning algorithm to store as metadata on that file in Box. Skills can be deployed and executed using any cloud infrastructure, but are typically built using serverless platforms like AWS Lambda, Google Cloud Functions or Microsoft Azure Functions. Whenever a skill is triggered, the skill uses a short-lived API token to download the file from Box, perform the analysis of the content using the machine learning provider and write the results back to the file in Box as metadata.

Using the Box Skills Kit, you can build any type of custom Skill you want. You can build custom skills that:

  • Leverage any third-party — or even your own — machine learning technology — to process a file in Box. Box Skills is designed to be an extensible framework that allows any machine learning technology to plug into Box.
  • Train a skill to work with a unique set of data specific to your business, like automatically recognizing and labeling images of products that your company produces.
  • Chain together multiple skills to solve more complicated business problems, like transcribing audio recordings of customer support calls and then running a sentiment analysis on the transcripts to determine which parts of the conversation map to which sentiments throughout the recording. You can also chain together machine learning algorithms from multiple providers into a single custom skill.

What is the Box Skills Kit?

The Box Skills Kit is a set of APIs, developer tools, documentation and sample code for building custom skills for Box. The Box Skills framework allows you to process files in Box using any machine learning technologies to add structure and extract additional insights. The machine learning outputs are then stored as metadata on the file objects, which is automatically indexed for search and displayed to an end user using a new metadata-driven “cards” interface when a user previews a file in Box.

Once a Skill has been authorized to a Box enterprise, a Box admin configures rules to determine when that skill should trigger the skill to execute. For example, a rule could trigger an image recognition Skill whenever an image file is uploaded to a certain folder in Box.

Box Skills leverages an underlying engine to execute skills whenever the defined event occurs. When the trigger event occurs (such as a file upload), an advanced webhook is sent to a predefined serverless function containing information about the file to be processed and the short-lived API token in the payload. The webhook also contains a message signature to verify that it comes from Box. The function downloads the file using the API token, passes the file to the machine learning provider for analysis and then the function terminates. Once the processing is complete, the function then writes the results back to Box as metadata using the same API token.

The Box Skills framework leverages serverless function to pass files to machine learning providers and write the output as metadata on the files.

This engine abstracts all the hard work of building your own integrations with machine learning providers, authenticating and verifying requests, writing and structuring the metadata, and handling things like retries and managing synchronous and asynchronous skills; all you need to do is create the skill and define the rules for when the skill should be triggered.

The metadata written to the file is valuable for a many reasons. First, this information is automatically indexed for search, so users can easily search for keywords (like searching for specific product names) or apply filters to find specific content in Box. Second, this metadata can be accessed via the API, which enables any number of use cases. And third, this metadata can be displayed alongside a file preview to provide further context and insight to a user.

The output of a Skill can come in several different formats: it might come in the form of keyword tags like a series of product names or key topics mentioned, it may come in the form of a timeline of where specific events occurred like when in a video a certain individual appeared, or it may come in the form of a text transcript like when an audio file is transcribed. To help make it easy for you to display this information to users, we’re also introducing a series of pre-designed, metadata-driven “cards”.

The “cards” interface will display the extracted metadata in the sidebar whenever a user is previewing a file that has been processed by a Skill. There are three types of cards:

When you create your Skill, you can define which “cards” to use to visualize the output of your Skills in the Box preview experience
  • Keyword Card — displays a list of keywords associated with a document or transcription. The keyword card also uses time ranges to specify when these topics appear throughout the audio or video file.
  • Timeline Card — displays a list of images or words that are associated with particular time ranges of a media file
  • Transcript Card — displays the transcribed text of a media file with associated time references

When you create your Skill, you can define which “cards” to use to visualize the output of your Skills in the Box preview experience. This makes it easy for end users to see additional context about a file and navigate to specific time ranges based on topic, transcript, or even speaker.

We’re incredibly excited about the possibilities of the Box Skills and the intersection of machine learning and content management. To get started, check out our developer documentation.

--

--