LLM Pipelines with Pinecone and HuggingFace with Python and Apache NiFi

Tim Spann

Published in

Cloudera

3 min readJan 25, 2024

This is part two of Vector Databases, LLM and Apache NiFi:

Codeless Generative AI Pipelines with Chroma Vector DB & Apache NiFi

Software: chroma vector database, Apache NiFi, Apache Kafka, Slack Python 3.x

medium.com

The NiFi Python processor for Pinecone requires OpenAI for transforming the data for vector storage.

Accelerating AI Data Pipelines: Building an Evernote Chatbot with Apache NiFi 2.0 and Generative AI

Evernote, a leading and powerful note-taking tool, can be a labyrinth of text, documents, images, and audio recordings…

medium.com

GitHub - georgevetticaden/evernote-ai-chatbot

Contribute to georgevetticaden/evernote-ai-chatbot development by creating an account on GitHub.

github.com

Pinecone Vector Database Free Tier

It was easy to use Chroma, but since we have Pinecone as an option I used that as well. Same flow, everything equally fast. Like the processor for Chroma I wanted to upgrade, so I upgraded that. I did a pull request, so this will go into mainstream soon.

FLaNK-VectorDB/README.md at main · tspannhw/FLaNK-VectorDB

NiFi and Vector Databases. Contribute to tspannhw/FLaNK-VectorDB development by creating an account on GitHub.

github.com

We sign up for a free Pinecone account and create an index (nifi). Now we can easily send data there. What is nice with Pinecone is I can see things being stored. This is helpful to make sure everything is working.

We will need to take our API Key from Pinecone to put in the query and put processors. We will also need to enter your OpenAI key which is needed for encoding and tokenizing your data.

**Set Pinecon API Key, OpenAI API Key, Pinecone environment, Index Name, text key field and some optional fields**

**Pinecone API Key, OpenAI Key, Pinecone Environment, Index Name, Query, # of Results, Text Key and Output Strategy**

REFERENCES

Apache NiFi for Dummies

Apache NiFi is an integrated data logistics and simple event processing platform.

www.cloudera.com

GitHub - georgevetticaden/evernote-ai-chatbot

Contribute to georgevetticaden/evernote-ai-chatbot development by creating an account on GitHub.

github.com

How I Won Singapore’s GPT-4 Prompt Engineering Competition

A deep dive into the strategies I learned for harnessing the power of Large Language Models (LLMs)

towardsdatascience.com

Codeless Generative AI Pipelines with Chroma Vector DB & Apache NiFi

Software: chroma vector database, Apache NiFi, Apache Kafka, Slack Python 3.x

medium.com

Writing a Generative AI Python Processor

Apache NiFi 2.0.0 — Python 3.x — IBM WatsonX — Vector Database

medium.com

Augmenting and Enriching LLM with Real-Time context

Adding Real-time streaming data to Generative AI workflows at any scale, anytime, anywhere

medium.com

Streaming LLM with Apache NiFi (HuggingFace)

See my talk on August 23, 2023 at NYC AI Dev Day.

medium.com

LLM Pipelines with Pinecone and HuggingFace with Python and Apache NiFi

Codeless Generative AI Pipelines with Chroma Vector DB & Apache NiFi

Software: chroma vector database, Apache NiFi, Apache Kafka, Slack Python 3.x

Accelerating AI Data Pipelines: Building an Evernote Chatbot with Apache NiFi 2.0 and Generative AI

Evernote, a leading and powerful note-taking tool, can be a labyrinth of text, documents, images, and audio recordings…

GitHub - georgevetticaden/evernote-ai-chatbot

Contribute to georgevetticaden/evernote-ai-chatbot development by creating an account on GitHub.

Pinecone Vector Database Free Tier

FLaNK-VectorDB/README.md at main · tspannhw/FLaNK-VectorDB

NiFi and Vector Databases. Contribute to tspannhw/FLaNK-VectorDB development by creating an account on GitHub.

REFERENCES

Apache NiFi for Dummies

Apache NiFi is an integrated data logistics and simple event processing platform.

GitHub - georgevetticaden/evernote-ai-chatbot

Contribute to georgevetticaden/evernote-ai-chatbot development by creating an account on GitHub.

How I Won Singapore’s GPT-4 Prompt Engineering Competition

A deep dive into the strategies I learned for harnessing the power of Large Language Models (LLMs)

Codeless Generative AI Pipelines with Chroma Vector DB & Apache NiFi

Software: chroma vector database, Apache NiFi, Apache Kafka, Slack Python 3.x

Writing a Generative AI Python Processor

Apache NiFi 2.0.0 — Python 3.x — IBM WatsonX — Vector Database

Augmenting and Enriching LLM with Real-Time context

Adding Real-time streaming data to Generative AI workflows at any scale, anytime, anywhere

Streaming LLM with Apache NiFi (HuggingFace)

See my talk on August 23, 2023 at NYC AI Dev Day.

Written by Tim Spann