LLM Pipelines with Pinecone and HuggingFace with Python and Apache NiFi

Tim Spann
Cloudera
Published in
3 min readJan 25, 2024

This is part two of Vector Databases, LLM and Apache NiFi:

The NiFi Python processor for Pinecone requires OpenAI for transforming the data for vector storage.

Pinecone Vector Database Free Tier

It was easy to use Chroma, but since we have Pinecone as an option I used that as well. Same flow, everything equally fast. Like the processor for Chroma I wanted to upgrade, so I upgraded that. I did a pull request, so this will go into mainstream soon.

We sign up for a free Pinecone account and create an index (nifi). Now we can easily send data there. What is nice with Pinecone is I can see things being stored. This is helpful to make sure everything is working.

We will need to take our API Key from Pinecone to put in the query and put processors. We will also need to enter your OpenAI key which is needed for encoding and tokenizing your data.

Set Pinecon API Key, OpenAI API Key, Pinecone environment, Index Name, text key field and some optional fields
Pinecone API Key, OpenAI Key, Pinecone Environment, Index Name, Query, # of Results, Text Key and Output Strategy

REFERENCES

--

--

Tim Spann
Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/