Build a Box classification service using FastAPI

Rui Barbosa
Box Developer Blog
Published in
4 min readJan 24, 2023

In the article Getting started with Box Classifications we've learned how to create classification labels and policies to be automatically applied to your content inside Box.

These policies allow for quite a few of customizations, however…

What if we need some really complex business logic for content classification?

Well there is an API for that, and we are going to explore an example, this time, using python and FastAPI to build a custom classification service.

Overview

The use case is derived from the previous sample app, where diver certification and insurance cards are uploaded to the app by the users, and we want to "stamp" this content with a PII label.

For this I'm using webhooks to pickup on events related with the content. The webhook will ping our classification service, which in turn will "stamp the content with the appropriate label.

Between the event and the classification call, you can insert any business rules you desire.

If you want to take a step back and read on how we got here, I recommend taking a look at these previous articles:

Setting up

We need to setup a couple of things on our Box app.

First let's go to the administration console, and under classification, create a new label.

creating a new classification label

Next, we need to create a webhook in our app, via the app console.

creating a webhook

To simplify the example I've applied the webhook to monitor events on any file inside the ClassificationService folder.

Show me the code

The main entry point is really quite strait forward.

@app.post("/box/classify")
async def classify(
request: Request,
settings: config.Settings = Depends(get_settings),
db: Session = Depends(get_db),
):
""" Classify endpoint"""

body_json = await request.json()
body = await request.body()

# check for valid signatures
is_valid = box_webhooks.webhook_signature_check(
webhook_id, body, request.headers, db, settings
)

if not is_valid:
raise HTTPException(status_code=404, detail="Invalid signature")

box_webhooks.classify_file(body_json["source"]["id"], db, settings)

return {"ok": True}

Box signs the request and we first need to verify that signature by using the validate_message method.

def webhook_signature_check(
webhook_id: str,
body: bytes,
header: dict,
db: Session,
settings: Settings,
) -> bool:
"""check the signature of the webhook request"""
# get a client object
client = jwt_check_client(db, settings)
# get a webhook object
webhook = client.webhook(webhook_id)

# get the signature keys
key_a = settings.WH_KEY_A
key_b = settings.WH_KEY_B

# validate message body
return webhook.validate_message(body, header, key_a, key_b)

We should also check to see if the webhook_id is expected by our app (just in case we mess up some configuration), and also keep track of the request id to prevent any possible replay attacks.

If all checks out, we can proceed to actually classify the file.

def classify_file(file_id: str, db: Session, settings: Settings):
"""classify a file"""

classification = settings.CLASSIFICATION
client = jwt_check_client(db, settings)

file = client.file(file_id)
# file.get()
# file.get_all_metadata()

# the super complex file classification rules go in here

file_class = file.get_classification()

if file_class is None or file_class != classification:
file_class = file.set_classification(classification)

Note that the client.file(file_id) doesn't actually download the file or even get the complete file object. There are plenty of different methods to get information on a file, for example file.get() , or file.get_all_metadata() , and more.

Now is where you can plugin all the business rules necessary to classify the file. You could send the information or even the file to an AI model for classification, correlate it with other data, etc.

This example is simple and will just blindly stamp all files in this folder with the PII label, which could be achieved using a simple existing policy, I'm just trying to illustrate the use case.

See it in action

As usual this fully working example can be cloned from this GitHub repo.

Let's see what happens when I upload a file to the specified folder:

uploading a file

This triggers the webhook to ping our classification service:

INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: Started reloader process [3292] using WatchFiles
INFO: Started server process [3311]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 10.1.1.70:39278 - "POST /box/classify HTTP/1.0" 200 OK

And the file gets classified as PII .

file as been classified

Although automatic classifications within Box are quite powerful, the Box Platform allows you to implement any business logic you want to classify your content.

Check out the other articles on this Box Shield and Classification series:

--

--