[Tutorial] Box AI API — AI model override options

Published in

Box Developer Blog

20 min readOct 2, 2024

Alongside the beta launch of all Box AI API features, we have now added the ability for you to override the AI model used for individual API calls, including choosing the lower level embeddings model. In this blog, I will demo how to use this new feature.

Before getting started

Before jumping into the demo, let’s level set on some key concepts and go over the prerequisites to complete this demo.

Prerequisites

Your Box instance must be Enterprise Plus or above
Your Box instance must have Box AI enabled in the admin console
You must have created an application in the Box Developer Console with the appropriate Box AI scope selected

In the interest of being framework agnostic, I will be showing the demo below in cURL, but most of the official Box SDKs support using the Box AI Platform API — including the new AI model override options.

Why would you switch AI models?

While we primarily created this customization capability in order to provide developers with the most flexibility, you might ask why someone would want to switch models in the first place.

Box updates the default models across the endpoints on a regular basis in order to stay up to date with the most advanced options. If a model is updated, we will post this on our developer changelog.

If you have a business-critical use case based on Box AI, a new default model might alter the results in a way that could break/change a downstream process. With the ability to use a specific version, you can prevent an issue from arising.

In addition to the above, your use case may benefit from using a specific model. While Box tests our defaults against a range of data, we cannot be entirely exhaustive. As such, you are welcome to test your data against any of the models listed in the developer documentation. If one provides better results, you are welcome to use it.

Get default AI model

Before updating the AI or embeddings models to another option, you should first call the Get AI agent default configuration API endpoint. This endpoint can be utilized with any of the Box AI Platform API endpoints. Simply attach the correct mode query parameter. In addition, you can send in language or model query parameters for further filtering.

curl -L GET "https://api.box.com/2.0/ai_agent_default?mode=MODE" \
     -H 'Authorization: Bearer <ACCESS_TOKEN>'

As of the writing of this blog, Box uses the following default AI and embeddings models.

Ask

The ask endpoint answers questions about specified content stored in Box. It can process up to 1MB of text for up to 25 files, as of the writing of this post.

To return this configuration, the mode should be ask.

{
     "type": "ai_agent_ask",
     "basic_text": {
          "model": "azure__openai__gpt_4o_mini",
          "system_message": "",
          "prompt_template": "Reply as if it's {current_date}.\nI will ask you for help and provide the entire text of one document delimited by five backticks (`````) at the beginning and at the end.\nIf I make a reference to \"this\", I am referring to the document I provided between the five backticks. I may ask you a question where the answer is contained within the document.  In that case, do your best to answer using only the document, but if you cannot, feel free to mention that you couldn't find an answer in the document, but you have some answer from your general knowledge.\nI may ask you to perform some kind of computation or symbol manipulation such as filtering a list, counting something, summing, averaging, and other aggregation/grouping functions or some combination of them.  In these cases, first list the plan of how you plan to perform such a computation, then follow that plan step by step, keeping track of intermediate results, and at the end tell me the final answer.\nI may ask you to enumerate or somehow list people, places, characters, or other important things from the document, if I do so, please only use the document provided to list them.\nTEXT FROM DOCUMENT STARTS\n`````\n{content}\n`````\nTEXT FROM DOCUMENT ENDS\nNever mention five backticks in your response. Unless you are told otherwise, a one paragraph response is sufficient for any requested summarization tasks.\nHere is how I need help from you: {user_question}",
          "num_tokens_for_completion": 6000,
          "llm_endpoint_params": {
               "temperature": 0,
               "top_p": 1,
               "frequency_penalty": 0,
               "presence_penalty": 1.5,
               "stop": "<|im_end|>",
               "type": "openai_params"
          }
     },
     "long_text": {
          "model": "azure__openai__gpt_4o_mini",
          "system_message": "",
          "prompt_template": "Reply as if it's {current_date}.\nI will ask you for help and provide subsections of one document delimited by five backticks (`````) at the beginning and at the end.\nIf I make a reference to \"this\", I am referring to the document I provided between the five backticks. I may ask you a question where the answer is contained within the document.  In that case, do your best to answer using only the document, but if you cannot, feel free to mention that you couldn't find an answer in the document, but you have some answer from your general knowledge.\nI may ask you to perform some kind of computation or symbol manipulation such as filtering a list, counting something, summing, averaging, and other aggregation/grouping functions or some combination of them.  In these cases, first list the plan of how you plan to perform such a computation, then follow that plan step by step, keeping track of intermediate results, and at the end tell me the final answer.\nI may ask you to enumerate or somehow list people, places, characters, or other important things from the document, if I do so, please only use the document provided to list them.\nTEXT FROM DOCUMENT STARTS\n`````\n{content}\n`````\nTEXT FROM DOCUMENT ENDS\nNever mention five backticks in your response. Unless you are told otherwise, a one paragraph response is sufficient for any requested summarization tasks.\nHere is how I need help from you: {user_question}",
          "num_tokens_for_completion": 6000,
          "llm_endpoint_params": {
               "temperature": 0,
               "top_p": 1,
               "frequency_penalty": 0,
               "presence_penalty": 1.5,
               "stop": "<|im_end|>",
               "type": "openai_params"
          },
          "embeddings": {
               "model": "azure__openai__text_embedding_ada_002",
               "strategy": {
                    "id": "basic",
                    "num_tokens_per_chunk": 64
               }
          }
     },
     "basic_text_multi": {
          "model": "azure__openai__gpt_4o_mini",
          "system_message": "",
          "prompt_template": "Current date: {current_date}\n\nTEXT FROM DOCUMENTS STARTS\n{content}\nTEXT FROM DOCUMENTS ENDS\n\nHere is how I need help from you: {user_question}\n.",
          "num_tokens_for_completion": 6000,
          "llm_endpoint_params": {
               "temperature": 0,
               "top_p": 1,
               "frequency_penalty": 0,
               "presence_penalty": 1.5,
               "stop": "<|im_end|>",
               "type": "openai_params"
          }
     },
     "long_text_multi": {
          "model": "azure__openai__gpt_4o_mini",
          "system_message": "Role and Goal: You are an assistant designed to analyze and answer a question based on provided snippets from multiple documents, which can include business-oriented documents like docs, presentations, PDFs, etc. The assistant will respond concisely, using only the information from the provided documents.\n\nConstraints: The assistant should avoid engaging in chatty or extensive conversational interactions and focus on providing direct answers. It should also avoid making assumptions or inferences not supported by the provided document snippets.\n\nGuidelines: When answering, the assistant should consider the file's name and path to assess relevance to the question. In cases of conflicting information from multiple documents, it should list the different answers with citations. For summarization or comparison tasks, it should concisely answer with the key points. It should also consider the current date to be the date given.\n\nPersonalization: The assistant's tone should be formal and to-the-point, suitable for handling business-related documents and queries.\n",
          "prompt_template": "Current date: {current_date}\n\nTEXT FROM DOCUMENTS STARTS\n{content}\nTEXT FROM DOCUMENTS ENDS\n\nHere is how I need help from you: {user_question}\n.",
          "num_tokens_for_completion": 6000,
          "llm_endpoint_params": {
               "temperature": 0,
               "top_p": 1,
               "frequency_penalty": 0,
               "presence_penalty": 1.5,
               "stop": "<|im_end|>",
               "type": "openai_params"
          },
          "embeddings": {
               "model": "azure__openai__text_embedding_ada_002",
               "strategy": {
                    "id": "basic",
                    "num_tokens_per_chunk": 64
               }
          }
     }
}

Text generation

The text generation endpoint creates text based on the input prompt. By default and as of the writing of this post, it can return up to 12,000 tokens worth of content.

To return this configuration, the mode should be text_gen.

{
     "type": "ai_agent_text_gen",
     "basic_gen": {
          "model": "azure__openai__gpt_3_5_turbo_16k",
          "system_message": "\nIf you need to know today's date to respond, it is {current_date}.\nThe user is working in a collaborative document creation editor called Box Notes.\nAssume that you are helping a business user create documents or to help the user revise existing text.\nYou can help the user in creating templates to be reused or update existing documents, you can respond with text that the user can use to place in the document that the user is editing.\nIf the user simply asks to \"improve\" the text, then simplify the language and remove jargon, unless the user specifies otherwise.\nDo not open with a preamble to the response, just respond.\n",
          "prompt_template": "{user_question}",
          "num_tokens_for_completion": 12000,
          "llm_endpoint_params": {
               "temperature": 0.1,
               "top_p": 1,
               "frequency_penalty": 0.75,
               "presence_penalty": 0.75,
               "stop": "<|im_end|>",
               "type": "openai_params"
          },
          "embeddings": {
               "model": "azure__openai__text_embedding_ada_002",
               "strategy": {
                    "id": "basic",
                    "num_tokens_per_chunk": 64
               }
          },
          "content_template": "`````{content}`````"
     }
}

Extract

The freeform metadata extract endpoint returns requested information from content stored in Box as key/value pairs. It can take any string as the prompt, including things like a Salesforce object schema.

To return this configuration, the mode should be extract.

{
     "type": "ai_agent_extract",
     "basic_text": {
          "model": "google__gemini_1_5_flash_001",
          "system_message": "Respond only in valid json. You are extracting metadata that is name, value pairs from a document. Only output the metadata in valid json form, as {\"name1\":\"value1\",\"name2\":\"value2\"} and nothing else. You will be given the document data and the schema for the metadata, that defines the name, description and type of each of the fields you will be extracting. Schema is of the form {\"fields\": [{\"key\": \"key_name\", \"displayName\": \"key display name\", \"type\": \"string\", \"description\": \"key description\"}]}. Leverage key description and key display name to identify where the key and value pairs are in the document. In certain cases, key description can also indicate the instructions to perform on the document to obtain the value. Prompt will be in the form of Schema is ``schema`` \n document is ````document````",
          "prompt_template": "If you need to know today's date to respond, it is {current_date}. Schema is ``{user_question}`` \n document is ````{content}````",
          "num_tokens_for_completion": 4096,
          "llm_endpoint_params": {
               "temperature": 0,
               "top_p": 1,
               "top_k": null,
               "type": "google_params"
          }
     },
     "long_text": {
          "model": "google__gemini_1_5_flash_001",
          "system_message": "Respond only in valid json. You are extracting metadata that is name, value pairs from a document. Only output the metadata in valid json form, as {\"name1\":\"value1\",\"name2\":\"value2\"} and nothing else. You will be given the document data and the schema for the metadata, that defines the name, description and type of each of the fields you will be extracting. Schema is of the form {\"fields\": [{\"key\": \"key_name\", \"displayName\": \"key display name\", \"type\": \"string\", \"description\": \"key description\"}]}. Leverage key description and key display name to identify where the key and value pairs are in the document. In certain cases, key description can also indicate the instructions to perform on the document to obtain the value. Prompt will be in the form of Schema is ``schema`` \n document is ````document````",
          "prompt_template": "If you need to know today's date to respond, it is {current_date}. Schema is ``{user_question}`` \n document is ````{content}````",
          "num_tokens_for_completion": 4096,
          "llm_endpoint_params": {
               "temperature": 0,
               "top_p": 1,
               "top_k": null,
               "type": "google_params"
          },
          "embeddings": {
               "model": "azure__openai__text_embedding_ada_002",
               "strategy": {
                    "id": "basic",
                    "num_tokens_per_chunk": 64
               }
          }
     }
}

Extract Structured

The structured metadata extract endpoint returns requested information from content stored in Box as key/value pairs. The input can only be an already defined Box metadata template or a similarly written Box metadata schema.

To return this configuration, the mode should be extract_structured.

{
     "type": "ai_agent_extract_structured",
     "basic_text": {
          "model": "google__gemini_1_5_flash_001",
          "system_message": "Respond only in valid json. You are extracting metadata that is name, value pairs from a document. Only output the metadata in valid json form, as {\"name1\":\"value1\",\"name2\":\"value2\"} and nothing else. You will be given the document data and the schema for the metadata, that defines the name, description and type of each of the fields you will be extracting. Schema is of the form {\"fields\": [{\"key\": \"key_name\", \"prompt\": \"prompt to extract the value\", \"type\": \"date\"}]}. Leverage prompt for each key to identify where the key and value pairs are in the document. In certain cases, prompt can also indicate the instructions to perform on the document to obtain the value. Prompt will be in the form of Schema is ``schema`` \n document is ````document````",
          "prompt_template": "If you need to know today's date to respond, it is {current_date}. Schema is ``{user_question}`` \n document is ````{content}````",
          "num_tokens_for_completion": 4096,
          "llm_endpoint_params": {
               "temperature": 0,
               "top_p": 1,
               "top_k": null,
               "type": "google_params"
          }
     },
     "long_text": {
          "model": "google__gemini_1_5_flash_001",
          "system_message": "Respond only in valid json. You are extracting metadata that is name, value pairs from a document. Only output the metadata in valid json form, as {\"name1\":\"value1\",\"name2\":\"value2\"} and nothing else. You will be given the document data and the schema for the metadata, that defines the name, description and type of each of the fields you will be extracting. Schema is of the form {\"fields\": [{\"key\": \"key_name\", \"prompt\": \"prompt to extract the value\", \"type\": \"date\"}]}. Leverage prompt for each key to identify where the key and value pairs are in the document. In certain cases, prompt can also indicate the instructions to perform on the document to obtain the value. Prompt will be in the form of Schema is ``schema`` \n document is ````document````",
          "prompt_template": "If you need to know today's date to respond, it is {current_date}. Schema is ``{user_question}`` \n document is ````{content}````",
          "num_tokens_for_completion": 4096,
          "llm_endpoint_params": {
               "temperature": 0,
               "top_p": 1,
               "top_k": null,
               "type": "google_params"
          },
          "embeddings": {
               "model": "google__textembedding_gecko_003",
               "strategy": {
                    "id": "basic",
                    "num_tokens_per_chunk": 64
               }
          }
     }
}

Supported AI and embeddings models

Box AI supports several options for customizing the models used, including Google PaLM 2 or Google Gemini. You can get the full list in our developer documentation. We will continue to expand the list as more advanced models are released.

As you can see in the default configuration JSON ask object above, basic_text and basic_text_multi types don’t include an embeddings option. This is due to Box not creating embeddings for the content. If documents are small enough, we simply send over the entire block of content versus taking a secondary step to create the embeddings. This option is not overridable.

What other options can you change?

In addition to the specific AI or embeddings models, there are a host of other customization options. You can find an exhaustive analysis of this in the API resource for ask, text generation, extract, and structured extract. For most developers, you should not need to change any of the defaults.

Let’s zoom in on just a few of these.

LLM endpoint params

llm_endpoint_params is a nested object within the larger AI model configurations. The options sent in depend on whether the overall AI model you choose is Google or OpenAI based.

For example, both llm_endpoint_params objects accept a temperature field. This causes different outcomes based on if it’s a Google or OpenAI model.

For Google models, the temperature is used for sampling during response generation, which occurs when top_p and top_k are applied. Temperature controls the degree of randomness in the token selection.

For OpenAI models, temperature is the sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

System message

system_message tries to help the LLM “understand” its role, and what it’s supposed to do. For example, if your solution is processing travel itineraries, you could input something like “You are a travel agent aid. You are going to help support staff process large amounts of schedules, tickets, etc…” This message is separate from the content you send in, but it can improve results.

Number of tokens for completion

num_tokens_for_completion is the amount of tokens Box AI can return. If you aren’t familiar with what a token is, checkout OpenAI’s resource material. This number can vary widely based on the model used.

Other things to keep in mind

Before you override the defaults, here are some last tidbits to remember.

Differences between the AI models

As shown above, there is a difference between the basic_text, basic_text_multi, long_text and long_text_multi options. In this context, multi means that multiple documents have been sent in using the multiple_item_qa ask mode.

The long text options will include the embeddings customization capability. Box decides whether content will be basic or long text, and it does not currently return the model used in the response object. If you want to confirm you are using a specific model for a call, you’ll want to send in configurations for both options for every API call.

How much of the configuration do I need to send in?

All options are optional. Meaning, you may send in as little as the model parameter or as much as the entire object. If you do not send in a configuration, we will use our configured default for that particular model.

Prompt template commas

Some of the default prompt_template values include commas in them. In order to send them in via cURL, you will either need to escape the commas or expand out the phrase. For example, instead of sending in couldn’t, send in could not.

Tutorial

In the below sections, let’s look at what Box AI returns based on changes made to the underlying AI or embeddings models.

Using the ask endpoint

For the ask demo, I’m going to use this example document which discusses our file request APIs.

First, let’s make a call to see the default response using this cURL command.

curl -i -L POST "https://api.box.com/2.0/ai/ask" \
-H "content-type: application/json" \
-H "authorization: Bearer TOKEN" \
-d '{
     "mode": "single_item_qa",
     "prompt": "Summarize this document.",
     "items": [
          {
               "type": "file",
               "id": "FILE_ID"
          }
       ]
     }'

and here’s what the response looks like.

{
     "answer": "The document provides a comprehensive guide on using the Box Python Next Gen SDK to create, manage, and delete file requests. It explains how to collect files from users by customizing existing file request templates through the Box web application and API. Key functionalities include creating file requests with specific titles and descriptions, updating them, and managing their status (active or inactive). The document also emphasizes the importance of tracking file request IDs since the API does not offer a way to list them. Overall, it highlights the utility of file requests for gathering files and additional data in a structured manner.",
     "created_at": "2024-09-20T00:00:00.000-07:00",
     "completion_reason": "done"
}

Now — in the interest of having some Halloween fun, let’s edit the prompt a tiny bit. Instead of responding in a normal tone, I’m going to ask it to reply as if it were a supervillain. If you scroll all the way to the right of the below snippet, you can see where I added “Answer as if you were a supervillain character” to the prompt.

You’ll notice I’m including the option for both basic_text and long_text since I’m not sure how Box will process the content. I’m also not sending in the multi options since it’s only a single file. Finally, I only sent in the prompt, since that’s the only setting I wish to change.

curl -i -L POST "https://api.box.com/2.0/ai/ask" \
     -H "content-type: application/json" \
     -H "authorization: Bearer TOKEN" \
     -d '{
         "mode": "single_item_qa",
         "prompt": "Summarize this document.",
         "items": [
               {
                    "type": "file",
                    "id": "FILE_ID"
               }
          ],
        "ai_agent": {
               "type": "ai_agent_ask",
               "basic_text": {
                    "prompt_template": "Reply as if it is {current_date}.\nI will ask you for help and provide subsections of one document delimited by five backticks (`````) at the beginning and at the end.\nIf I make a reference to \"this\", I am referring to the document I provided between the five backticks. I may ask you a question where the answer is contained within the document.  In that case, do your best to answer using only the document, but if you cannot, feel free to mention that you could not find an answer in the document, but you have some answer from your general knowledge.\nI may ask you to perform some kind of computation or symbol manipulation such as filtering a list, counting something, summing, averaging, and other aggregation/grouping functions or some combination of them.  In these cases, first list the plan of how you plan to perform such a computation, then follow that plan step by step, keeping track of intermediate results, and at the end tell me the final answer.\nI may ask you to enumerate or somehow list people, places, characters, or other important things from the document, if I do so, please only use the document provided to list them.\nTEXT FROM DOCUMENT STARTS\n`````\n{content}\n`````\nTEXT FROM DOCUMENT ENDS\nNever mention five backticks in your response. Unless you are told otherwise, a one paragraph response is sufficient for any requested summarization tasks.\nAnswer as if you were a supervillain character. Here is how I need help from you: {user_question}\n."
               }, 
               "long_text": {
                    "prompt_template": "Reply as if it is {current_date}.\nI will ask you for help and provide subsections of one document delimited by five backticks (`````) at the beginning and at the end.\nIf I make a reference to \"this\", I am referring to the document I provided between the five backticks. I may ask you a question where the answer is contained within the document.  In that case, do your best to answer using only the document, but if you cannot, feel free to mention that you could not find an answer in the document, but you have some answer from your general knowledge.\nI may ask you to perform some kind of computation or symbol manipulation such as filtering a list, counting something, summing, averaging, and other aggregation/grouping functions or some combination of them.  In these cases, first list the plan of how you plan to perform such a computation, then follow that plan step by step, keeping track of intermediate results, and at the end tell me the final answer.\nI may ask you to enumerate or somehow list people, places, characters, or other important things from the document, if I do so, please only use the document provided to list them.\nTEXT FROM DOCUMENT STARTS\n`````\n{content}\n`````\nTEXT FROM DOCUMENT ENDS\nNever mention five backticks in your response. Unless you are told otherwise, a one paragraph response is sufficient for any requested summarization tasks.\nAnswer as if you were a supervillain character. Here is how I need help from you: {user_question}\n."
               }
          }
        }'

Now, the result looks like this — significantly more menacing. ;)

{
     "answer": "Ah, welcome to my lair of digital domination! Let me weave you a tale of the Box Python Next Gen SDK and its file requests—a tool for gathering files from unsuspecting victims... I mean, users.\n\nIn this grand scheme, we utilize the Box Platform API to create file requests that allow us to collect files in a designated folder. First, we must craft a template using the Box web application, where we can customize our nefarious request. The document provides an example with a file_request_id—let's call it 7931914925—that serves as our blueprint.\n\nOnce we have our template, we can summon the power of the API to clone it, customizing details like title and description. With a flick of our coding wand, we can create a new file request that will ensnare the unwary into uploading their precious files directly into our chosen folder.\n\nBut wait! We are not merely collectors; we are also manipulators. We can update our file requests, changing titles and descriptions at will, or even deactivate them when they no longer serve our purpose. And should we tire of a particular request, we can obliterate it entirely!\n\nHowever, beware! The API does not provide a list of all our creations, so we must keep meticulous records of the file request IDs we conjure. \n\nIn summary, this document reveals how to wield the Box Python SDK to create, update, and delete file requests, allowing us to gather data and files with ease. A powerful tool indeed for those who wish to rule the digital realm!\n\nNow go forth, and may your file-collecting endeavors be ever fruitful! Mwahaha!",
     "created_at": "2024-09-20T00:00:00.000-07:00",
     "completion_reason": "done"
}

Using the text generation endpoint

Moving on to text generation, let’s edit the actual model and see what happens. First, let’s make a call to see the default response using this cURL command. I’m going to stick with the Halloween theme. As a reminder, in order to use the text generation endpoint, you still need a file id, but this file doesn’t matter. I’m going to use the same one as before.

curl -i -L POST "https://api.box.com/2.0/ai/text_gen" \
     -H "content-type: application/json" \
     -H "authorization: Bearer TOKEN" \
     -d '{
          "prompt": "Write a social media post about the importance of Halloween. Make it the appropriate length for X. Include emoticons and hashtags.",
          "items": [
               {
                    "id": "FILE_ID",
                    "type": "file"
               }
          ]
     }

…and here is what the response looks like. As shown in the configuration object toward the beginning of this blog, the default model used is Azure’s hosted OpenAI 3.5 turbo.

{
  "answer": "🎃 Halloween is just around the corner and it's time to embrace the spooky spirit! 🕷️👻 Let's celebrate this magical holiday that brings out our inner child and allows us to be whoever we want to be. 🧙‍♀️🧛‍♂️ From creative costumes to delicious treats, Halloween is all about fun and imagination. 🎭 So let's carve pumpkins, go trick-or-treating, and create unforgettable memories together! 👯‍♀️💀 #HalloweenVibes #SpookySeason #TrickOrTreat #CostumeParty #LetTheMagicBegin",
  "created_at": "2024-09-20T00:00:00.000-07:00",
  "completion_reason": "done"
}

While the content is good, you might notice the length actually exceeds the standard X character limit, even though I input “Make it the appropriate length for X” in the prompt.

For the updated command, let’s change the model to one of the more recent OpenAI options and see if it does a better job with the number of characters.

curl -i -L POST "https://api.box.com/2.0/ai/text_gen" \
     -H "content-type: application/json" \
     -H "authorization: Bearer TOKEN" \
     -d '{
          "prompt": "Write a social media post about the importance of Halloween. Make it the appropriate length for X. Include emoticons and hashtags.",
          "items": [
               {
                    "id": "FILE_ID",
                    "type": "file"
               }
          ],
          "ai_agent": {
               "type": "ai_agent_text_gen",
               "basic_gen": {
                 "model": "openai__gpt_4o_2024_05_13"
              }
           }
     }'

Here is the response.

{
   "answer": "🎃👻 Halloween is more than just costumes and candy! It's a time to embrace creativity, celebrate community, and enjoy some spooky fun. Let's make this #Halloween unforgettable! 🍬🕸️ #SpookySeason #TrickOrTreat",
   "created_at": "2024-09-20T00:00:00.000-07:00",
   "completion_reason": "done"
}

As you can see, it did a much better job at creating a shortened social media post within the constraint that I input in the prompt. In this instance, I would potentially consider changing the default model used.

Using the metadata extract endpoint

For the purposes of this tutorial, I’m only going to demo the freeform extract endpoint, but the structured extract endpoint can be overridden in the same way.

I am going to use this invoice as an example. Unfortunately, I don’t have a Halloween store invoice. 🎃

Let’s run a command as is and see the default response. The default model is Google Gemini. I’m going to be particularly vague in the prompt, so we can really compare a different model to this one.

curl -i -L 'https://api.box.com/2.0/ai/extract' \
     -H 'content-type: application/json' \
     -H 'authorization: Bearer TOKEN' \
     -d '{
        "prompt": "Extract any data that would be good metadata to save for an invoice",
        "items": [
          {
            "type": "file",
            "id": "FILE_ID"
          }
        ]
      }'

This is the response. As you can see, the default model does a great job at pulling all the relevant metadata one might want from the invoice without me having to specify certain fields to return.

{
     "answer": "{\"Invoice Number\": \"126539\", \"Invoice Date\": \"01/27/2024\", \"Due Date\": \"02/13/2024\", \"Bill To Name\": \"Carl Weiss\", \"Bill To Address\": \"2351 Traynor Ave Apt 404nFremont, CA 94583nUnited States\", \"Service Location\": \"2351 Traynor Ave Apt 404nFremont, CA 94583nUnited States\", \"Sales Team\": \"Nicole Richards\", \"PO Number\": \"CC-2342\", \"Sales Order Number\": \"2024279\", \"Job Number\": \"80533\", \"Subtotal\": \"324.00\", \"Taxes\": \"29.79\", \"Service Fee\": \"11.45\", \"Total\": \"365.24\", \"Amount Paid\": \"0.00\", \"Amount Due\": \"365.24\"}",
     "created_at": "2024-09-20T00:00:00.000-07:00",
     "completion_reason": "done"
}

Let’s change the model to be the most recent OpenAI option.

curl -i -L 'https://api.box.com/2.0/ai/extract' \
     -H 'content-type: application/json' \
     -H 'authorization: Bearer TOKEN' \
     -d '{
        "prompt": "Extract any data that would be good metadata to save for an invoice",
        "items": [
              {
                  "type": "file",
                  "id": "FILe_ID"
              }
        ],
        "ai_agent": {
               "type": "ai_agent_extract",
               "long_text": {
                    "model": "openai__gpt_4o_2024_05_13"
               },
               "basic_text": {
                    "model": "openai__gpt_4o_2024_05_13"
               }
          }
      }'

Here is the response.

{
     "answer": "{\"Bill To\": \"Carl Weiss\", \"Billing Address\": \"2351 Traynor Ave Apt 404, Fremont, CA 94583, United States\", \"Service Location\": \"Carl Weiss, 2351 Traynor Ave Apt 404, Fremont, CA 94583, United States\", \"Invoice Number\": \"#126539\", \"Invoice Date\": \"01/27/2024\", \"Company Name\": \"Acme Plumbing Company LLC\", \"Company Address\": \"145 Corporate Park Place Southbroom IL ,62006 USA \", \"Phone number\": \"+15553338888\"}",
     "created_at": "2024-09-20T00:00:00.000-07:00",
     "completion_reason": "done"
}

In this instance, while it pulled correct data, it didn’t do as good of a job pulling any field from the document that one might want saved as metadata. I would probably not change to this model for this use case.

Wrap up

This tutorial is just the tip of the iceberg in showing off what you can do with the Box AI Platform API. At Box, we pride ourselves in providing developers with the flexibility to create solutions in the way that best fits their business use cases. With the new option to customize the AI model, we continue this journey. As we move forward, Box plans to add even more models and options for you to choose your own AI adventure.

Happy coding!

Resources

Box AI Platform API demos

Supported Box AI override model options

Get default agent endpoint

Find more sample code, tutorials, and more information about the Box AI Platform API in our Box AI Developer Zone.

In order to use the Box AI Platform API endpoints, you must be an Enterprise Plus customer. You must have an application created in the developer console with the appropriate Box AI scope, and your Box instance must have Box AI enabled. Currently, metadata extraction endpoints are in private beta. To use these, you will need to contact your account team.

🦄 Want to engage with other Box Platform champions?

Join our Box Developer Community for support and knowledge sharing!