Searching content in the Box Platform
In any content management system, especially with unstructured content, search is fundamental to help users find what they are looking for.
However many developers working with Box find unexpected results when using search. In this article we're going to explore the different aspects of searching from a developer perspective.
Concepts
The Box API provides a way to find content in Box using full-text search queries. Support for the Box search API is available in all our supported SDKs and the CLI.
Box is not a file system. Often developers expect the search to behave similar to a typical file system search, using paths, wildcards, and file or folder names.
Search in Box is an indexed database search. It indexes name, description, tags, comments, and content up to the first 10k bytes. Every time files or folders are created, updated or deleted, the index is updated, asynchronously.
This means that the search index is not always up to date, and it may take a few minutes to update.
Let's get started
Consider the following tree structure in the Box app:
- workshops
- search
- apple
- apple1.txt
- apple2.txt
- apple3.txt
- apple banana
- apple.txt
- banana.txt
- apple pineapple banana
- apple.txt
- banana.txt
- pineapple.txt
- banana
- banana.txt
- banana apple
- apple.txt
- banana.txt
- pineapple
- pineapple.txt
And this Python snippet to get things started with some simple print methods and a Box client:
""" Searching Box exercises"""
import logging
from typing import Iterable
from boxsdk.object.item import Item
from utils.config import AppConfig
from utils.box_client import get_client
logging.basicConfig(level=logging.INFO)
logging.getLogger("boxsdk").setLevel(logging.CRITICAL)
conf = AppConfig()
def print_box_item(box_item: Item):
"""Basic print of a Box Item attributes"""
print(f"Type: {box_item.type} ID: {box_item.id} Name: {box_item.name}, ")
def print_search_results(items: Iterable["Item"]):
"""Print search results"""
print("--- Search Results ---")
for item in items:
print_box_item(item)
print("--- End Search Results ---")
if __name__ == "__main__":
client = get_client(conf)
Simple search
Here is a sample search method. What happens if we just search for apple
?
def simple_search(query: str) -> Iterable["Item"]:
"""Search by query in any Box content"""
return client.search().query(query=query)
if __name__ == "__main__":
client = get_client(conf)
# Simple Search
search_results = simple_search("apple")
print_search_results(search_results)
Resulting in:
--- Search Results ---
--- End Search Results ---
Many times the initial reaction from developers to this exercise is that no results are returned means search does not work.
However it remember that indexing is an asyncronous proceess, it may take a few minutes to update. If we just uploaded the sample files this is expected.
A few minutes later…
--- Search Results ---
Type: folder ID: 208850093677 Name: apple banana,
Type: folder ID: 208858841669 Name: apple,
Type: folder ID: 208856751058 Name: apple pineapple banana,
Type: folder ID: 208848037313 Name: banana apple,
Type: file ID: 1220477661707 Name: apple.txt,
Type: file ID: 1220476606374 Name: apple.txt,
Type: file ID: 1220478477851 Name: apple.txt,
Type: file ID: 1220478610566 Name: apple1.txt,
Type: file ID: 1220479548719 Name: apple2.txt,
Type: file ID: 1220481540143 Name: apple3.txt,
--- End Search Results ---
Noticed it picked up:
- both files and folders
- items with
apple
in the name - including
apple1
,apple2
, andapple3
. - did not include
pineapple
Let's try the same simple search but this time using apple banana
:
# Expanded Search
search_results = simple_search("apple banana")
print_search_results(search_results)
Resulting in:
--- Search Results ---
Type: folder ID: 231320711952 Name: apple banana
Type: folder ID: 231318527838 Name: apple pineapple banana
Type: folder ID: 231320108594 Name: banana apple
Type: folder ID: 231319410565 Name: banana
Type: folder ID: 231318889313 Name: apple
Type: file ID: 1337960845864 Name: banana.txt
Type: file ID: 1337971324252 Name: banana.txt
Type: file ID: 1337959496665 Name: banana.txt
Type: file ID: 1337968972110 Name: apple.txt
Type: file ID: 1337956847194 Name: banana.txt
Type: file ID: 1337966423041 Name: apple.txt
Type: file ID: 1337967294253 Name: apple.txt
Type: file ID: 1337963451641 Name: apple1.txt
Type: file ID: 1337967213245 Name: apple2.txt
Type: file ID: 1337962062207 Name: apple3.txt
--- End Search Results ---
Notice we have expanded our search. Now it is returning anything with apple
or banana
or both.
Exact(ish) search
We can use quotes to eliminate this any match pattern and have a more exact string search:
# "Exact" Search
search_results = simple_search('"apple banana"')
print_search_results(search_results)
Resulting in:
--- Search Results ---
Type: folder ID: 231320711952 Name: apple banana
--- End Search Results ---
Using search operators
We can use search operators, AND
, OR
, and NOT
to improve what are we looking for. For example:
apple NOT banana
should return items with both “apple” but not “banana”apple AND pineapple
should return items with both “apple” and “pineapple”pineapple OR banana
should return items with “pineapple” or “banana”
Unexpected search results
Did you know that the plural of banana
without the b
is actually pineapple
in 6 different languages?
Let’s search for ananas
:
# More Searches
search_results = simple_search('ananas')
print_search_results(search_results)
Results in:
--- Search Results ---
Type: file ID: 1337971411200 Name: pineapple.txt
Type: file ID: 1337965525302 Name: pineapple.txt
--- End Search Results ---
Where did the ananas
come from?
Remember that the search doesn’t look only at the name, but also at the description, tags, comments, and content.
pineapple.txt
has the word ananas
in the description and content.
Specifying where to search
Let's modify the search method to accept a parameter that allows the developer to specify in which attributes the search should be performed.
def simple_search(query: str, content_types: Iterable[str] = None) -> Iterable["Item"]:
"""Search by query in any Box content"""
return client.search().query(query=query, content_types=content_types)
Now try searching for ananas
again, but only in the name:
# Search only in name
search_results = simple_search(
"ananas",
content_types=[
"name",
],
)
print_search_results(search_results)
Note: In Python a string is an Iterable of characters. Make sure you pass the content_types as a list.
You get an empty result, ananas
does not exist in the name of any files.
--- Search Results ---
--- End Search Results ---
Let's get the ananas
back by including the description in the search:
# Search in name and description
search_results = simple_search(
"ananas",
content_types=[
"name",
"description",
],
)
print_search_results(search_results)
Resulting in:
--- Search Results ---
Type: file ID: 1337965525302 Name: pineapple.txt
Type: file ID: 1337971411200 Name: pineapple.txt
--- End Search Results ---
Specifying what to return
So far we haven't specified the type of content to be returned in the search, we may get files or folder depending on it. However we have the results be only files or folders.
Let's modify the search method to accept a result_type
parameter:
def simple_search(
query: str, content_types: Iterable[str] = None, result_type: str = None
) -> Iterable["Item"]:
"""Search by query in any Box content"""
return client.search().query(
query=query, content_types=content_types, result_type=result_type
)
And search for apple
but only have folders
returned:
# Search for folders only
search_results = simple_search("apple", result_type="folder")
print_search_results(search_results)
Resulting in:
--- Search Results ---
Type: folder ID: 231320711952 Name: apple banana
Type: folder ID: 231318889313 Name: apple
Type: folder ID: 231320108594 Name: banana apple
Type: folder ID: 231318527838 Name: apple pineapple banana
--- End Search Results ---
Specifying the search location
We can also specify the search location. Until now we have searched our entire content. Let's modify the search method to accept this new parameter:
def simple_search(
query: str,
content_types: Iterable[str] = None,
result_type: str = None,
ancestor_folders: Iterable["Folder"] = None,
) -> Iterable["Item"]:
"""Search by query in any Box content"""
return client.search().query(
query=query,
content_types=content_types,
result_type=result_type,
ancestor_folders=ancestor_folders,
)
In the sample content we have a banana.txt
file in the all folders containing banana
in the name.
Let’s search for banana
but print the parent folder name:
# Search banana
search_results = simple_search("banana")
print("--- Search Results ---")
for item in search_results:
print(
f"Type: {item.type} ID: {item.id} Name: {item.name} Folder: {item.parent.name}"
)
print("--- End Search Results ---")
Resulting in:
--- Search Results ---
Type: folder ID: 231319410565 Name: banana Folder: search
Type: folder ID: 231320711952 Name: apple banana Folder: search
Type: folder ID: 231320108594 Name: banana apple Folder: search
Type: folder ID: 231318527838 Name: apple pineapple banana Folder: search
Type: file ID: 1337959496665 Name: banana.txt Folder: apple pineapple banana
Type: file ID: 1337971324252 Name: banana.txt Folder: banana
Type: file ID: 1337956847194 Name: banana.txt Folder: banana apple
Type: file ID: 1337960845864 Name: banana.txt Folder: apple banana
--- End Search Results ---
Let's modify your search to only search banana
in the banana apple
and apple banana
folders, returning only files:
The folder ids are specific to your Box account. Make sure you use the correct ids.
# Ancestor Search
folder_apple_banana = client.folder("231320711952")
folder_banana_apple = client.folder("231320108594")
search_results = simple_search(
"banana",
ancestor_folders=[folder_apple_banana, folder_banana_apple],
result_type="file",
)
print("--- Search Results ---")
for item in search_results:
print(f"Type: {item.type} ID: {item.id} Name: {item.name} Folder: {item.parent.name}")
print("--- End Search Results ---")
Resulting in files found only in the specified folders:
--- Search Results ---
Type: file ID: 1337960845864 Name: banana.txt Folder: apple banana
Type: file ID: 1337956847194 Name: banana.txt Folder: banana apple
--- End Search Results ---
There are many more parameters you can use to refine your search.
Try them out and see what you can find:
- file_extensions
- created_at_range
- updated_at_range
- size_range
- trash_content
- sort
- direction
Final thoughts
Although powerful, the search API was primarily designed to help users find content in Box, and may not be suited for all use cases:
- Box is not a file system, so it doesn’t have paths.
- It is an indexed search, so it may take a few minutes for the content to be indexed.
- It indexes names, description, tags, comments, and content, often giving unexpected results to developers.
Documentation and references
Thoughts? Comments? Feedback?
Drop us a line on our community forum.