Box Developer Blog

News and stories for working with the Box APIs

Weaviate + Box RAG recipe with Weaviate Query Agent

--

Today, I’m thrilled to share a new demo hitting the Weaviate GitHub recipe repository, straight from the BoxDev team! This end-to-end example walks you through building a Retrieval-Augmented Generation (RAG) workflow by embedding Box content into a Weaviate vector database and leveraging Weaviate’s shiny new Query Agent (just launched in March 2025) to answer questions about your data. Whether you’re a developer, data enthusiast, or just curious about AI-powered search, this recipe has something for you.

Let’s dive in! ✅

Overview

Before we jump into the nitty-gritty, let’s set the stage with some key concepts and introductions.

What is Box?

Box is the leading Intelligent Content Management platform for securely storing, sharing, and collaborating on files. Think of it as your digital filing cabinet — except it’s way smarter, with APIs that let developers create custom workflows around their content. In this recipe, we’re using Box to store documents that we’ll later search and query, making it the foundation of our RAG pipeline.

What is Weaviate?

Weaviate is an open-source vector database that’s built for speed, scale, and AI-driven search. It stores data as objects and vectors, letting you combine semantic search (via embeddings) with structured filtering. It’s cloud native and fault tolerant, and integrates directly with large language models (LLMs). Here, Weaviate acts as our storage and search engine, powering the magic of agentic RAG with its new Query Agent.

How do these solutions create an end-to-end RAG solution?

RAG is a technique that pairs a vector search (retrieval) with a language model (generation) to answer questions using your own data. Here’s the high-level flow.

  1. Content storage: Box holds your files (e.g., PDFs, docs, or text reports)
  2. Embedding creation: The extracted text from these files is chunked, and embeddings are generated (numerical representations) using Weaviate Embeddings
  3. Querying: Weaviate’s Query Agent takes a natural language question, generates multiple search and aggregate queries as needed, and then forms one output to the user — all in one go using agentic RAG.

This combo lets you ask specific questions about your content and get smart, context-aware responses without hallucinated nonsense.

Pretty cool, right?

Recipe walkthrough

Let’s go through the steps to get this RAG workflow up and running, based on the demo we released today. You’ll need a Jupyter Notebook environment (e.g., Visual Studio Code or local setup) to follow along.

Get a Box developer token

If you don’t already have a Box account, sign up for a free developer account.

For the recipe, you’ll need a Box developer token:

  • Log into the Box Developer Console.
  • Create a new app (choose “Custom App” with “Standard OAuth 2.0” authentication). You may name it whatever you would like to. Read/write all files scopes should be added for this example. Click “Save changes” in the top right.
  • Generate a developer token under the “Configuration” tab. This token lasts 60 minutes, so you’ll need to renew it if you take longer than that to complete the demo.
  • Save this token; you’ll plug it into the notebook later.

Create a Weaviate Account

You will also need a Weaviate Cloud cluster. It is super simple to get one.

  • Sign up on Weaviate Cloud for a free sandbox tier.
  • Create a new cluster in the WCD dashboard. You can name it whatever you would like.
  • Grab your cluster URL and API key from the “Details” tab; you’ll plug it into the notebook later.

Download recipe and complete step by step

Clone or download the repository from GitHub.

  • Visit the Weaviate recipe GitHub repository.
  • You can either download the code as a zip folder or clone the repository.
  • Inside the folder, you’ll find an integrations folder, followed by a data-platforms folder. Find the Box folder inside and open the Jupyter Notebook in the development environment of your choice.
Find the weaviate_box Jupyter Notebook

Inside the Box folder, there’s another folder called demo_files that’s preloaded with four 10-K financial reports to use for the demo. If you’d like to use another set of files, you’re welcome to delete those files and replace them with your own.

Weaviate Box recipe repository

Make sure to update the Box and Weaviate authentication variables in step 3’s code block, which you noted down in the setup earlier.

Add in authentication variables in step 3

To run this notebook, you simply run each section. If you have the notebook VS Code extension installed, you can simply press “Play” on each code block.

Hit the play button if using VS Code

Once complete, you should see something like to as the final response. You’re welcome to update the query in step 7 to another question based on the 10-K’s included and/or ask questions on custom content you added.

Final answer returned based on query and content

Next steps

You’ve got a working RAG demo — now what?

  • Expand your data: Upload more files to Box (e.g., annual reports, articles) and rerun the notebook.
  • Tweak the agent: Adjust the system_prompt to make answers more detailed or formal (e.g., “Provide a detailed analysis…”)
  • Explore other agents: Weaviate’s Transformation Agent could preprocess your data, or the Personalization Agent could tailor responses (both in preview — check weaviate.io/developers/agents).

This recipe blends Box’s Intelligent Content Management with Weaviate’s vector search and Query Agent to create a powerful, AI-driven search tool. It’s a glimpse into how modern platforms can team up to make your data smarter and more accessible.

As always, we’re excited to hear your feedback and stories on the Box Developer Community. 🦄

--

--

Box Developer Blog
Box Developer Blog

Published in Box Developer Blog

News and stories for working with the Box APIs

Alex Novotny
Alex Novotny

Written by Alex Novotny

I’m a Box Developer Advocate, helping others learn how to maximize their investment through Box Platform.

No responses yet