Searching content in the Box Platform

Rui Barbosa
Box Developer Blog
Published in
7 min readOct 26, 2023

--

Image by pikisuperstar on Freepik

In any content management system, especially with unstructured content, search is fundamental to help users find what they are looking for.

However many developers working with Box find unexpected results when using search. In this article we're going to explore the different aspects of searching from a developer perspective.

Concepts

The Box API provides a way to find content in Box using full-text search queries. Support for the Box search API is available in all our supported SDKs and the CLI.

Box is not a file system. Often developers expect the search to behave similar to a typical file system search, using paths, wildcards, and file or folder names.

Search in Box is an indexed database search. It indexes name, description, tags, comments, and content up to the first 10k bytes. Every time files or folders are created, updated or deleted, the index is updated, asynchronously.

This means that the search index is not always up to date, and it may take a few minutes to update.

Let's get started

Consider the following tree structure in the Box app:

- workshops
- search
- apple
- apple1.txt
- apple2.txt
- apple3.txt
- apple banana
- apple.txt
- banana.txt
- apple pineapple banana
- apple.txt
- banana.txt
- pineapple.txt
- banana
- banana.txt
- banana apple
- apple.txt
- banana.txt
- pineapple
- pineapple.txt

And this Python snippet to get things started with some simple print methods and a Box client:

""" Searching Box exercises"""
import logging
from typing import Iterable

from boxsdk.object.item import Item

from utils.config import AppConfig
from utils.box_client import get_client

logging.basicConfig(level=logging.INFO)
logging.getLogger("boxsdk").setLevel(logging.CRITICAL)

conf = AppConfig()

def print_box_item(box_item: Item):
"""Basic print of a Box Item attributes"""
print(f"Type: {box_item.type} ID: {box_item.id} Name: {box_item.name}, ")

def print_search_results(items: Iterable["Item"]):
"""Print search results"""
print("--- Search Results ---")
for item in items:
print_box_item(item)
print("--- End Search Results ---")

if __name__ == "__main__":
client = get_client(conf)

Simple search

Here is a sample search method. What happens if we just search for apple ?

def simple_search(query: str) -> Iterable["Item"]:
"""Search by query in any Box content"""

return client.search().query(query=query)

if __name__ == "__main__":
client = get_client(conf)

# Simple Search
search_results = simple_search("apple")
print_search_results(search_results)

Resulting in:

--- Search Results ---
--- End Search Results ---

Many times the initial reaction from developers to this exercise is that no results are returned means search does not work.

However it remember that indexing is an asyncronous proceess, it may take a few minutes to update. If we just uploaded the sample files this is expected.

A few minutes later…

--- Search Results ---
Type: folder ID: 208850093677 Name: apple banana,
Type: folder ID: 208858841669 Name: apple,
Type: folder ID: 208856751058 Name: apple pineapple banana,
Type: folder ID: 208848037313 Name: banana apple,
Type: file ID: 1220477661707 Name: apple.txt,
Type: file ID: 1220476606374 Name: apple.txt,
Type: file ID: 1220478477851 Name: apple.txt,
Type: file ID: 1220478610566 Name: apple1.txt,
Type: file ID: 1220479548719 Name: apple2.txt,
Type: file ID: 1220481540143 Name: apple3.txt,
--- End Search Results ---

Noticed it picked up:

  • both files and folders
  • items with apple in the name
  • including apple1 , apple2 , and apple3.
  • did not include pineapple

Let's try the same simple search but this time using apple banana :

# Expanded Search
search_results = simple_search("apple banana")
print_search_results(search_results)

Resulting in:

--- Search Results ---
Type: folder ID: 231320711952 Name: apple banana
Type: folder ID: 231318527838 Name: apple pineapple banana
Type: folder ID: 231320108594 Name: banana apple
Type: folder ID: 231319410565 Name: banana
Type: folder ID: 231318889313 Name: apple
Type: file ID: 1337960845864 Name: banana.txt
Type: file ID: 1337971324252 Name: banana.txt
Type: file ID: 1337959496665 Name: banana.txt
Type: file ID: 1337968972110 Name: apple.txt
Type: file ID: 1337956847194 Name: banana.txt
Type: file ID: 1337966423041 Name: apple.txt
Type: file ID: 1337967294253 Name: apple.txt
Type: file ID: 1337963451641 Name: apple1.txt
Type: file ID: 1337967213245 Name: apple2.txt
Type: file ID: 1337962062207 Name: apple3.txt
--- End Search Results ---

Notice we have expanded our search. Now it is returning anything with apple or banana or both.

Exact(ish) search

We can use quotes to eliminate this any match pattern and have a more exact string search:

# "Exact" Search
search_results = simple_search('"apple banana"')
print_search_results(search_results)

Resulting in:

--- Search Results ---
Type: folder ID: 231320711952 Name: apple banana
--- End Search Results ---

Using search operators

We can use search operators, AND , OR , and NOT to improve what are we looking for. For example:

  • apple NOT banana should return items with both “apple” but not “banana”
  • apple AND pineapple should return items with both “apple” and “pineapple”
  • pineapple OR banana should return items with “pineapple” or “banana”

Unexpected search results

Did you know that the plural of banana without the b is actually pineapple in 6 different languages?

Let’s search for ananas :

# More Searches
search_results = simple_search('ananas')
print_search_results(search_results)

Results in:

--- Search Results ---
Type: file ID: 1337971411200 Name: pineapple.txt
Type: file ID: 1337965525302 Name: pineapple.txt
--- End Search Results ---

Where did the ananas come from?

Remember that the search doesn’t look only at the name, but also at the description, tags, comments, and content.

pineapple.txt has the word ananas in the description and content.

Ananas exists both on the content and file description

Specifying where to search

Let's modify the search method to accept a parameter that allows the developer to specify in which attributes the search should be performed.

def simple_search(query: str, content_types: Iterable[str] = None) -> Iterable["Item"]:
"""Search by query in any Box content"""

return client.search().query(query=query, content_types=content_types)

Now try searching for ananas again, but only in the name:

# Search only in name
search_results = simple_search(
"ananas",
content_types=[
"name",
],
)
print_search_results(search_results)

Note: In Python a string is an Iterable of characters. Make sure you pass the content_types as a list.

You get an empty result, ananas does not exist in the name of any files.

--- Search Results ---
--- End Search Results ---

Let's get the ananas back by including the description in the search:

# Search in name and description
search_results = simple_search(
"ananas",
content_types=[
"name",
"description",
],
)
print_search_results(search_results)

Resulting in:

--- Search Results ---
Type: file ID: 1337965525302 Name: pineapple.txt
Type: file ID: 1337971411200 Name: pineapple.txt
--- End Search Results ---

Specifying what to return

So far we haven't specified the type of content to be returned in the search, we may get files or folder depending on it. However we have the results be only files or folders.

Let's modify the search method to accept a result_type parameter:

def simple_search(
query: str, content_types: Iterable[str] = None, result_type: str = None
) -> Iterable["Item"]:
"""Search by query in any Box content"""

return client.search().query(
query=query, content_types=content_types, result_type=result_type
)

And search for apple but only have folders returned:

# Search for folders only
search_results = simple_search("apple", result_type="folder")
print_search_results(search_results)

Resulting in:

--- Search Results ---
Type: folder ID: 231320711952 Name: apple banana
Type: folder ID: 231318889313 Name: apple
Type: folder ID: 231320108594 Name: banana apple
Type: folder ID: 231318527838 Name: apple pineapple banana
--- End Search Results ---

Specifying the search location

We can also specify the search location. Until now we have searched our entire content. Let's modify the search method to accept this new parameter:

def simple_search(
query: str,
content_types: Iterable[str] = None,
result_type: str = None,
ancestor_folders: Iterable["Folder"] = None,
) -> Iterable["Item"]:
"""Search by query in any Box content"""

return client.search().query(
query=query,
content_types=content_types,
result_type=result_type,
ancestor_folders=ancestor_folders,
)

In the sample content we have a banana.txt file in the all folders containing banana in the name.

Let’s search for banana but print the parent folder name:

# Search banana
search_results = simple_search("banana")

print("--- Search Results ---")
for item in search_results:
print(
f"Type: {item.type} ID: {item.id} Name: {item.name} Folder: {item.parent.name}"
)
print("--- End Search Results ---")

Resulting in:

--- Search Results ---
Type: folder ID: 231319410565 Name: banana Folder: search
Type: folder ID: 231320711952 Name: apple banana Folder: search
Type: folder ID: 231320108594 Name: banana apple Folder: search
Type: folder ID: 231318527838 Name: apple pineapple banana Folder: search
Type: file ID: 1337959496665 Name: banana.txt Folder: apple pineapple banana
Type: file ID: 1337971324252 Name: banana.txt Folder: banana
Type: file ID: 1337956847194 Name: banana.txt Folder: banana apple
Type: file ID: 1337960845864 Name: banana.txt Folder: apple banana
--- End Search Results ---

Let's modify your search to only search banana in the banana appleand apple bananafolders, returning only files:

The folder ids are specific to your Box account. Make sure you use the correct ids.

# Ancestor Search
folder_apple_banana = client.folder("231320711952")
folder_banana_apple = client.folder("231320108594")
search_results = simple_search(
"banana",
ancestor_folders=[folder_apple_banana, folder_banana_apple],
result_type="file",
)

print("--- Search Results ---")
for item in search_results:
print(f"Type: {item.type} ID: {item.id} Name: {item.name} Folder: {item.parent.name}")
print("--- End Search Results ---")

Resulting in files found only in the specified folders:

--- Search Results ---
Type: file ID: 1337960845864 Name: banana.txt Folder: apple banana
Type: file ID: 1337956847194 Name: banana.txt Folder: banana apple
--- End Search Results ---

There are many more parameters you can use to refine your search.

Try them out and see what you can find:

  • file_extensions
  • created_at_range
  • updated_at_range
  • size_range
  • trash_content
  • sort
  • direction

Final thoughts

Although powerful, the search API was primarily designed to help users find content in Box, and may not be suited for all use cases:

  • Box is not a file system, so it doesn’t have paths.
  • It is an indexed search, so it may take a few minutes for the content to be indexed.
  • It indexes names, description, tags, comments, and content, often giving unexpected results to developers.

Documentation and references

Thoughts? Comments? Feedback?

Drop us a line on our community forum.

--

--