Jina AI + Weaviate = Efficient Data Storage in the Cloud

Now you can leverage Weaviate as the document store for Jina’s DocArray for faster processing and retrieval of Documents in the cloud!

Shubham Saboo
Jina AI

--

Introduction

DocArray is a unique first-of-its-kind data structure for unstructured data. It is a part of the bigger Jina AI ecosystem. It can accommodate all kinds of data including text, images, audio, video, etc, and is designed to be intuitive to use with Python, so you can get started right away without any pre-requisites.

Weaviate is an open-source vector search engine that stores both objects and vectors combining the vector search with structured filtering to build robust and fault-tolerant search engines. It also provides the cloud storage infrastructure out-of-the-box in the form of Weaviate Cluster Service (WCS).

Set Up a Weaviate Instance

There are two ways to set up a cloud storage instance using Weaviate, let’s look at them one by one:

Starting a Weaviate Instance Locally

To use the Weaviate storage service at the backend you need to start a new Weaviate instance. You can do that by creating the docker-compose.yml as follows:

Once you have created the docker file, you can run docker compose up to start the instance.

Create a Weaviate Cloud Service Instance

You can create a Weaviate instance with WCS: Weaviate Cloud Service for free. You just have to sign-up and follow the instructions on the UI to set up a new instance. You can check out the following video for a step-by-step walkthrough of creating a Weaviate instance.

Minimum Working Example

In this example, we will create a weaviate local instance to store the Document and build a simple text search.

First, start the weaviate service and create a DocumentArray array instance.

Now let's index the Documents:

Now we will generate the embeddings using the BERT model:

Finally, we can query the indexed Documents and get the results:

Output: Persist Documents with Weaviate.

DocArray + Weaviate in action!

To give you a glimpse of the potential capabilities of Weaviate’s integration with Jina AI, we have created a colab notebook where you can simply create a basic fashion search engine (image-to-image search) by just using DocArray and Weaviate.

Follow along with the interactive video tutorial 👉

🔗 Check out the GitHub repository for the source code and the notebook.

Learning Resources

This is part-1 of three-part series about exploring different cloud document stores that can be used with Jina’s DocArray.

Stay tuned for our next blog on using Jina’s DocArray with the vector database Qdrant!

--

--