Fetching watermarked PDFs from Box using Representations, Python, and Flask

As many of you may have already seen, we recently launched the new Representations service, which allows you to pull different representations of your files, such as personalized watermarked PDFs (for secure sharing), thumbnails of documents and images, or alternate text meta data.

Those interesting transformations are what we’re going to look into today. As you may already know, you can apply a watermark layer over documents right now on Box. That watermark is applied when viewing the file on the mobile app or website, but is not applied when downloading the file.

Using Python, Flask, and Representations, we’re going to create a service to fetch a confidential watermarked PDF for a given app user, then download that PDF locally while maintaining the watermark.

Example confidential downloaded PDF, with watermark and highly secure data

For those of you who want to skip to code, everything is available on my Github Box samples repo, and includes setup instructions.

For everyone still here, let’s jump right into it by starting with our config data.

# Default search values
user_name = 'John Snow'
file_name = 'confidential.pdf'
# Auth config
client_id = 'YOUR CLIENT ID'
client_secret='YOUR CLIENT SECRET'
enterprise_id='YOUR ENTERPRISE ID'
jwt_key_id= 'YOUR JWT PRIVATE KEY ID'
rsa_private_key_file_sys_path='YOUR PRIVATE KEY LOCATION'
rsa_private_key_passphrase='YOUR PRIVATE KEY PASSWORD'

For the sake of this example, we have an app user that we’ll be using, named John Snow (no relation), and a file that we’ll be searching for to download, confidential.pdf. The auth config data is all of our configuration information needed for the JWT / OAuth 2 setup with the Python SDK.

Next up, let’s use some of that configuration data to set up the Box Python SDK and authenticate our app.

from boxsdk import Client
from boxsdk import JWTAuth
from flask import Flask
import config
import requests
app = Flask(__name__)
# Configure JWT auth and fetch access token
auth = JWTAuth(
client_id=config.client_id,
client_secret=config.client_secret,
enterprise_id=config.enterprise_id,
jwt_key_id=config.jwt_key_id,
rsa_private_key_file_sys_path=config.private_key_path,
rsa_private_key_passphrase=config.private_key_passphrase
)
# Obtain client auth
access_token = auth.authenticate_instance()
client = Client(auth)

After including the Box SDK, Flask, our config data, and requests, we configure a new JWT auth object with our application data and create an application auth, which we’ll be able to use to do things on behalf of users, search for content, etc.

Next up, we want to set up our Flask application so that we can visit a web endpoint (representing our web app), such as http://127.0.0.1:5000/, to begin triggering our download.

@app.route('/')
def index():
"""Flask route for file download

Search for a user by name and auth as that user.
Search for a watermarked PDF file by name and capture file ID.
Make request for representational data for that file.
Make request to download file

Returns:
A string containing the message to be displayed to the user
"""

# Fetch user by name and auth app user
users = client.users(filter_term=config.user_name)
box_user = users[0] if users else client.create_user(config.user_name)
user_at = auth.authenticate_app_user(box_user)
  # Search for file by name and get first result returned
file_search = client.search(config.file_name, limit=1, offset=0)
fid = file_search[0].get(fields=['name']).id
  # Get download URI template for watermarked PDF
uri = "https://api.box.com/2.0/files/%s?fields=representations" % (fid)
response = requests.get(uri, headers={'Authorization': 'Bearer ' + user_at, 'x-rep-hints': '[pdf]'});
file_info = response.json()
file_uri = file_info['representations']['entries'][0]['content']['url_template']
file_uri_dl = file_uri.replace('{+asset_path}', '')
  # Download watermarked PDF 
download_file(file_uri_dl, "./%s.pdf" % (fid), user_at)
return file_uri_dl

As soon as we hit that endpoint, we start by finding the user that we set up in our configuration file. In a real application context, this would be the current user of your web app that needs the data. We find that user (or create a new user if they don’t exist) and get a user authenticated access token to start doing things on their behalf.

Next, we search for our highly confidential file, by name, and then extract the data on the first file that’s returned. In a real app context, we could either supply a direct file ID, or search for a number of confidential documents and display them to the user to download in our web app.

We then structure the Box Representations API request URI, using the file ID, and passing in the header stating we want a download link for the watermarked PDF ('x-rep-hints': '[pdf]'). When the request is made to Box, the response contains a url_template string, which is the location from which we can download the PDF. In the string is a template variable, {+asset_path}, which in the case of this call we replace with a blank string. In other Representation API request types this would be the item (such as a download size) that you specifically want to download.

We then make a request to download the file, and for the sake of displaying something to the screen, return the file URI to be displayed.

def download_file(url, localPath, at):
"""Downloads a file from Box
  Makes an authenticated request to the Box API to download a given
watermarked PDF, and save to a local file.

Returns:
A string containing the message to be displayed to the user
"""
# Fetch Box file and write locally
req = requests.get(url, headers={'Authorization': 'Bearer ' + at}, stream=True)
with open(localPath, 'wb') as local_file:
for chunk in req.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
local_file.write(chunk)

The download file function will download a file from a given URI, save it to a provided local path, and use the user access token that we provided in to verify permission to download the file.

Finally, we start up the server by running Flask

FLASK_APP=representations_pdf_jwt.py flask run

Then we head to the endpoint provided (default is http://127.0.0.1:5000) to start downloading the file.

That’s all there is to it. There are many other great new features introduced by the Representations API, but being able to securely share confidential documents by maintaining personalized watermarks will allow you to explore building on the Box platform with all of the data that you currently trust the service to, extending to you own sites and services.