ChromaDB in Java (langchain4j 🦜)

3 min readJan 13, 2024

In this article, we’ll look at how to integrate the ChromaDB embedding database into a Java application. ChromaDB is a vector database and allows you to build a semantic search for your AI app.

Why Java: Even if Python is much more common for building AI programs, the use of Java in the server and especially enterprise area should not be underestimated. Some companies want to build their server applications in Java and with the ChromaDB integration this is now even easier.

Explanation of ChromaDB workflow — How ChromaDB Works (https://docs.trychroma.com/)

Preparation:

An OpenAI API Key to calculate the embeddings: https://platform.openai.com/docs/guides/embeddings/what-are-embeddings
A Maven or Gradle Project
Docker installed (https://docs.docker.com/engine/install/)

Step 1: Start the DB

Run the chromadb/chroma Docker image. You can use the following command: docker run -p 8000:8000 chromadb/chroma
Take a look at the Docker log. Everything should start just fine. The download and start of the image could take up to 3 minutes (with slow internet even longer) so be patient.

Step 2: Install Dependencies

You need to install de dependencies listed below. They allow you to use the API of OpenAI and ChromaDB. Please look for up do date versions to avoid security problems.

If you use Gradle add to the gradle.build:

dependencies {
 implementation 'dev.langchain4j:langchain4j:0.25.0'
 implementation 'dev.langchain4j:langchain4j-open-ai:0.25.0'
 implementation 'com.google.code.gson:gson:2.10.1'
 implementation 'dev.langchain4j:langchain4j-chroma:0.25.0'
}

If you use Maven add to the pom.xml:

<dependencies>
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j</artifactId>
        <version>0.25.0</version>
    </dependency>
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-open-ai</artifactId>
        <version>0.25.0</version>
    </dependency>
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.10.1</version>
    </dependency>
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-chroma</artifactId>
        <version>0.25.0</version>
    </dependency>

</dependencies>

3. Create Models

Now you need to create the Java objects that you will interact with. The langchain4j library gives you access to builders. For this tutorial, we need an EmbeddingStore and an EmbeddingModel.
The EmbeddingStore will use the ChromaDB we created in the first step.
The EmbeddingModel will use the OpenAI model “text-embedding-ada-002”.

⚠️ YOU NEED TO CHANGE THE API KEY

Set the base url (or leave it if you used my docker command) and come up with a collection name. I used a generic name, but you can choose it freely.
I use static variables so I don’t create more instances than I need, but you can create as many as you want.

import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.openai.OpenAiEmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.chroma.ChromaEmbeddingStore;

public class Chroma {
    
    public static final EmbeddingStore<TextSegment> embeddingStore =
            ChromaEmbeddingStore.builder()
                    .baseUrl("http://localhost:8000/")
                    .collectionName("my-collection")
                    .build();

    public static final EmbeddingModel embeddingModel =
            OpenAiEmbeddingModel.builder()
                    .apiKey("API_KEY")
                    .modelName("text-embedding-ada-002")
                    .build();
}

4. Add documents

Now you can use the created model and store and add text with or without metadata. The metadata is automatically retrieved, if you search. In production i use the metadata for sources and type of source.

import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;

import static com.example.Chroma.embeddingModel;
import static com.example.Chroma.embeddingStore;

public class ChromaInserter {

    /**
     * Add text.
     */
    public static void addDocuments(String text) {
        TextSegment segment1 = TextSegment.from(text, new Metadata());
        Embedding embedding1 = embeddingModel.embed(segment1).content();
        embeddingStore.add(embedding1, segment1);
    }

    /**
     * Add text with metadata.
     */
    public static void addDocuments(String text, Metadata metadata) {
        TextSegment segment1 = TextSegment.from(text, metadata);
        Embedding embedding1 = embeddingModel.embed(segment1).content();
        embeddingStore.add(embedding1, segment1);
    }
}

5. Search for documents

Lets implement a quick search for documents.

import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingMatch;

import java.util.List;

import static com.example.Chroma.embeddingModel;
import static com.example.Chroma.embeddingStore;

public class ChromaSearcher {

  public static List<EmbeddingMatch<TextSegment>> search(String query,
                                                         int maxResults) { 
     Embedding queryEmbedding = embeddingModel.embed(query).content()
     return embeddingStore.findRelevant(queryEmbedding, maxResults);
  }
}

6. Entrypoint

With all your utility methods created, we can now interact with the ChromaDB

import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingMatch;

import java.util.List;

public class Main {
    public static void main(String[] args) {
        ChromaInserter.addDocuments("I like football.");
        ChromaInserter.addDocuments("The weather is good today.");

        List<EmbeddingMatch<TextSegment>> search = ChromaSearcher.search("What is your favorite sport?", 1);
        // Prints:
        // Score: 0,926483
        // Result: I like football.
        System.out.printf("Score: %f\nResult: %s\n", search.getFirst().score(), search.getFirst().embedded().text());
    }
}

Source:

medium/ChromaDBinJava at main · TimJ0212/medium

Contribute to TimJ0212/medium development by creating an account on GitHub.

github.com

langchain4j:

langchain4j - Repositories

langchain4j has 9 repositories available. Follow their code on GitHub.