Sneak peek at Box AI API

Published in

Box Developer Blog

6 min readOct 17, 2023

You may have heard during our recent BoxWorks product keynote, that we’re making the Box Platform even better with all new Box AI APIs. Soon you will be able to leverage Box AI APIs within your custom apps to both meet your business’ unique needs and extend the power of your enterprise content — all while maintaining content security and compliance requirements you’re used to with Box.

Meanwhile we’re working hard behind the scenes to bring you the best functionality. In fact we had an internal hackathon to observe how developers would use the Box AI API and see what cool use cases they could come up with.

Let’s take a sneak peek at the Box AI API, including some winning entries from our internal hackathon. We’ll also look at sample apps, and how we’ve extended the existing Python SDK.

Disclaimer: The Box AI API documentation may evolve and we will introduce additional developer resources for Box AI as general availability approaches.

Endpoints

Essentially we had two endpoints to work with, /2.0/ai/ask , and the /2.0/ai/text_gen .

These represent modes for interacting with the Box AI API, ask for question and answer and text_gen for conversational text generation.

From an API perspective the difference between the two is that using the ask endpoint you don't send the conversation history. It is intended for single question/answer over some content.

On the other hand with text_gen endpoint you send the conversation history, so the Box AI stays aware of the entire conversation and can build up on the previous questions and answers.

Mechanics

To interact with the Box AI API, you typically send content as well as a prompt about the content.

The content can be a specific file or set of files, or a text representation, or even a snippet of the content like a simple phrase.

The prompt typically represents a question, but you can use anything and see how the Box AI reacts.

Direct vs streamed answers

The AI may take some time to construct an answer, so you have the option to wait for the complete answer or get a stream of words until the answer is complete. The streamed answers will send something back to the user, preventing a longer wait time for something to happen.

Examples

Let me show you some examples so we can see how requests are made. For context, I'm sending content related to a diving trip waiver.

Single item QA

curl --location 'https://api.box.com/2.0/ai/ask' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Gj...PE' \
--data '{
    "prompt": "summarize document",
    "items": [
        {
            "type": "file",
            "id": "1282567002306",
            "content": ""
        }
    ],
    "mode": "single_item_qa",
    "config": {
        "is_streamed": false
    }
}'

Resulting in:

{
    "answer": "The document provided is a liability waiver for participants 
               engaging in water activities, specifically scuba diving. 
              It states that individuals must be able to swim and be in good 
              physical condition to participate. 
              The purpose of signing the document is to exempt and release 
              the dive center, its employees, agents, and dive boats from 
              any liabilities arising from their acts or omissions.
              ...
              - Full name needs to be provided as well as signature.
              Please let me know how I can further assist you based on this 
              information!",
    "created_at": "2023-10-04T07:35:06.154290294-07:00",
    "completion_reason": "done"
}

Now instead of sending the entire document let's just send a snippet:

curl --location 'https://api.box.com/2.0/ai/ask' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Gj...PE' \
--data '{
    "prompt": "do I need to know how to swim to go diving?",
    "items": [
        {
            "type": "file",
            "id": "1282567002306",
            "content": "YOU MUST BE ABLE TO SWIM TO PARTICIPATE IN ANY IN WATER ACTIVITIES."
        }
    ],
    "mode": "single_item_qa",
    "config": {
        "is_streamed": false
    }
}'

Resulting in:

{
    "answer": "According to the document you provided, it states that 
    \"YOU MUST BE ABLE TO SWIM TO PARTICIPATE IN ANY IN WATER ACTIVITIES.\" 
    Therefore, based on this information, it can be inferred that knowing 
    how to swim is a requirement for participating in diving or any other 
    water activities.",
    "created_at": "2023-10-04T07:45:23.701917249-07:00",
    "completion_reason": "done"
}

The same example but using a stream:

curl --location 'https://api.box.com/2.0/ai/ask' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Gj..PE' \
--data '{
    "prompt": "do I need to know how to swim to go diving?",
    "items": [
        {
            "type": "file",
            "id": "1282567002306",
            "content": "YOU MUST BE ABLE TO SWIM TO PARTICIPATE IN ANY IN WATER ACTIVITIES."
        }
    ],
    "mode": "single_item_qa",
    "config": {
        "is_streamed": true
    }
}'

Results in:

{"answer":"According","created_at":"2023-10-04T07:47:32.567377685-07:00"}
{"answer":" to","created_at":"2023-10-04T07:47:32.584226294-07:00"}
{"answer":" the","created_at":"2023-10-04T07:47:32.634503092-07:00"}
{"answer":" document","created_at":"2023-10-04T07:47:32.671469331-07:00"}
{"answer":" you","created_at":"2023-10-04T07:47:32.68524942-07:00"}
...
{"answer":" any","created_at":"2023-10-04T07:47:34.467767413-07:00"}
{"answer":" other","created_at":"2023-10-04T07:47:34.509386552-07:00"}
{"answer":" water","created_at":"2023-10-04T07:47:34.529145131-07:00"}
{"answer":" activities","created_at":"2023-10-04T07:47:34.565221139-07:00"}
{"answer":".","created_at":"2023-10-04T07:47:34.590461678-07:00"}
{"answer":"","created_at":"2023-10-04T07:47:34.653314326-07:00","completion_reason":"done"}

Multiple item QA

Imagine now that you have multiple pieces of content with information that can contribute to the answer. This method allows you to send multiple pieces of content at the same time.

You just need to specify multiple items or alternatively items and content.

For example suppose the diving waiver had an extra document related to boat trips.

curl --location 'https://api.box.com/2.0/ai/ask' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Gj...PE' \
--data '{
    "prompt": "do I need to know how to swim?",
    "items": [
        {
            "type": "file",
            "id": "1282567002306",
            "content": ""
        },
        {
            "type": "file",
            "id": "1282565377545",
            "content": ""
        }
    ],
    "mode": "multiple_item_qa",
    "config": {
        "is_streamed": false
    }
}'

Text generation

Text generation works the same way, but you send the history of the conversation, which includes the prompts and answers. This allows the Box AI API to build on previous answers and prompts.

Example of a first request to text_gen :

curl --location 'https://api.box.com/2.0/ai/text_gen' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Gj...PE' \
--data '{
    "prompt": "do I need to know how to swim?",
    "items": [
        {
            "type": "file",
            "id": "1282567002306",
            "content": ""
        }
    ],
    "config": {
        "is_streamed": false
    }
}'

The subsequent request includes the conversation history:

curl --location 'https://api.box.com/2.0/ai/text_gen' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Gj...PE' \
--data '{
    "prompt": "do I need to know how to swim?",
    "items": [
        {
            "type": "file",
            "id": "1282567002306",
            "content": ""
        }
    ],
    "dialogue_history": [
        {
         "prompt": "do I need to know how to swim?",
         "answer": "According to the document you provided, it states that 
         \"YOU MUST BE ABLE TO SWIM TO PARTICIPATE IN ANY IN WATER ACTIVITIES.\" 
         Therefore, based on this information, it can be inferred that knowing 
         how to swim is a requirement for participating in diving or any other 
         water activities.",
         "created_at": "2023-10-04T07:45:23.701917249-07:00",
         "completion_reason": "done"
        }
    ],
    "config": {
        "is_streamed": false
    }
}'

Python SDK

At the time of the Box’s internal AI hackathon the Python SDK didn't include these endpoints. However this is one reason we keep the SDK's open source.

It turned out to be very easy to extend the Python SDK, so I thought this might be interesting for the community.

For example the normal /ai/ask endpoint call looks like this. Notice how easy it is to grab the generic session.post() method that handles everything for you:

def _get_ai_api_response(self, prompt: str, ai_question: AIQuestion) -> AIAnswer:
        ai_question_json = ai_question.to_json()
        ai_question_json["config"] = {"is_streamed": False}
        data = json.dumps(ai_question_json)
        # print(data)

        url = self.get_url("ai/ask")
        box_response = self._session.post(url, data=data, expect_json_response=True)

        response = box_response.json()
        response_object = self.translator.translate(
            session=self._session,
            response_object=response,
        )

        return AIAnswer(
            answer=response_object["answer"],
            created_at=response_object["created_at"],
            completion_reason=response_object["completion_reason"],
            prompt=prompt,
        )

The streamed version returns an iterator. It still uses the session.post but this time it is not expecting a complete JSON answer. From this point it yields chunks of data representing the multiple lines.

def _get_ai_api_response_streamed(self, prompt: str, ai_question: AIQuestion) -> AIAnswer:
        ai_question_json = ai_question.to_json()
        ai_question_json["config"] = {"is_streamed": True}
        data = json.dumps(ai_question_json)
        # print(data)

        url = self.get_url("ai/ask")
        box_response = self._session.post(url, data=data, expect_json_response=False)

        for chunk in box_response.network_response.request_response.iter_lines():
            if chunk:
                response_object = self.translator.translate(
                    session=self._session,
                    response_object=json.loads(chunk),
                )

                yield AIAnswer(
                    answer=response_object["answer"],
                    created_at=response_object["created_at"],
                    completion_reason=response_object.get("completion_reason"),
                    prompt=prompt,
                )

If you want to take a deeper look into the code, checkout this GitHub repo. And here’s a list of sample apps:

Here is a video showcase:

The hackathon winners

The Boxers Jake Dolgenos and Brad Rosenfield were the proud winners of this internal hackathon! They submitted not one but two super interesting and creative project that show case what you can do with the Box AI API.

RFP automation

This project takes a request for proposal spreadsheet, and based on answers and documents from previous RFP’s, automatically completes the answers to the RFP questions.

Intelligent redaction

This project takes a document, scans it for PII and sensitive information, and automatically redacts that content.

Check out more details in our forum post: Box AI API Documentation

Thoughts? Comments? Feedback?

Drop us a line on our community forum.