Augmenting and Enriching LLM with Real-Time context

Tim Spann
Cloudera
Published in
6 min readJan 8, 2024

Adding Real-time streaming data to Generative AI workflows at any scale, anytime, anywhere

Augmented Generate Artificial Intelligent Data Reality

Refactoring Real-Time Code For Price, Speed, Functionality and Performance

I switched from Alphavantage to Finnhub for my corporate name lookup as they have a better free program. This one does require some more complex processing, but it’s worth it since I may need to make a lot of calls for people’s requests. Such as:

Q: What is the outlook for IBM this year?
https://finnhub.io/api/v1/search?q=${companyName:urlEncode()}&token=GetYourFreeCode

Jolt Transform

[
{
"operation": "shift",
"spec": {
"result": {
"*": {
"description": "[&1].description",
"displaySymbol": "[&1].displaySymbol",
"symbol": "[&1].symbol",
"type": "[&1].type"
}
}
}
}
]

Append Stock Information

Stock Value for ${companyName} [${nlp_org_1}/${stockSymbol}] on ${date} 
is ${closeStockValue}. stock date ${stockdateTime}.
stock exchange ${exchange}

Use Python (In Next Update We will create a custom Python Processor)

/opt/demo/runcompany.sh

So we are using a couple of libraries , some NLP and some Python. This code is based on https://stackoverflow.com/questions/72986264/company-name-extraction-with-bert-base-ner-easy-way-to-know-which-words-relate

We are using SPaCY with a solid pre-built model to extract the company name from the Slack request. This has proven much stronger than the model I am using with the Apache OpenNLP model.

Model: xlm-roberta-large-finetuned-conll03-english

I am going to look at using a NiFi Python Processor and also at calling HuggingFace via REST to see which is better and to give everyone options.

from transformers import pipeline
from subprocess import list2cmdline
import spacy
from spacy.matcher import Matcher
import time
import argparse

parser = argparse.ArgumentParser(description='CompanyNameParser')

# parameter
parser.add_argument('--input', type=str, default='Question: How is Walmart doing?"', help='string to parse')

args = parser.parse_args()

start = time.time()
nlp = spacy.load('en_core_web_sm')
model_checkpoint = "xlm-roberta-large-finetuned-conll03-english"
token_classifier = pipeline(
"token-classification", model=model_checkpoint, aggregation_strategy="simple"
)

# Organisation names extraction
def org_name(extracted_text):
classifier = token_classifier(extracted_text)
# Get the list of dictionary with key value pair "entity":'ORG'
values = [item for item in classifier if item["entity_group"] == "ORG"]
# Get the list of dictionary with key value pair "entity":'ORG'
res = [sub['word'] for sub in values]
final1 = list(set(res)) # Remove duplicates
final = list(filter(None, final1)) # Remove empty strings
print(final[0])


#org_name("Q: What is the outlook for Fedex this year?")
org_name(args.input)
end = time.time()

#print("The time of execution of above program is :", round((end - start), 2))

Clean and Enrich Company Information

${companyName:trim():ifElse(${companyName},${companyName2:trim()})}

Add Company Information

${generated_text:append(
${companyInfo:replaceAll('\{\"displayStock\"\:\"',' '):
replaceAll('\"\}',' ')})}

Cache That Company Information

Retrieve Company Information

Query Record SQL

SELECT description as companyName, symbol 
FROM FLOWFILE
WHERE type like '%Common%Stock%'
AND symbol not LIKE '\d'
AND symbol not like '\.'
LIMIT 1

Send Enriched AI + Real-Time Stock Information

${generated_text:substringAfter('ANSWER:'):replaceAll('\)','')}

Example Question & Response


Timothy J Spann
Q: What is the outlook for American Water this year?

timchat
American Water Works Company, Inc. (AWK Q1 2023 Earnings Call Transcript
American Water Works Company, Inc. (NYSE:AWK Q1 2023 Earnings Call dated May. 04, 2023. Corporate Participants: Brian Chin — Senior Vice President and Chief Financial Officer.
https://www.fool.com/earnings/call-transcripts/2023/05/04/american-water-works-company-inc-awk-q1-2023-earn/
American Water Works Company, Inc. (AWK CEO Susan Story on Q1 2023 Results - Earnings Call Transcript
American Water Works Company, Inc. (NYSE:AWK Q1 2023 Earnings Conference Call May
Stock Value for AMERICAN STATES WATER CO [/AWR] on Sat, 06 Jan 2024 03:26:05 GMT is 77.54000.
jastock date 2024/01/05 15:59:00. stock exchange NYSE

Output JSON

{
"date" : "Mon, 08 Jan 2024 22:15:25 GMT",
"x-global-transaction-id" : "c44ad66a56c4e2cb853480d91116bae1",
"x-request-id" : "9742864c-1dd6-4c20-8e01-3082aad3177e",
"cf-ray" : "8427cc5dfa1e42e0-EWR",
"inputs" : "Q: What is the outlook for American Water this year?",
"created_at" : "2024-01-08T22:15:25.433Z",
"stop_reason" : "max_tokens",
"x-correlation-id" : "bXZnanQ-2cb9169d98fd4a37b29c030696e8f7ea",
"x-proxy-upstream-service-time" : "2100",
"message_id" : "disclaimer_warning",
"model_id" : "meta-llama/llama-2-70b-chat",
"invokehttp.request.duration" : "8137",
"message" : "This model is a Non-IBM Product governed by a third-party license that may impose use restrictions and other obligations. By using this model you agree to its terms as identified in the following URL. URL: https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx",
"uuid" : "dfedbad8-df0c-4572-b47c-9eb07cf099a1",
"generated_text" : "American Water Works Company, Inc. (AWK) Q1 2023 Earnings Call Transcript\nAmerican Water Works Company, Inc. (NYSE:AWK) Q1 2023 Earnings Call dated May. 04, 2023. Corporate Participants: Brian Chin — Senior Vice President and Chief Financial Officer.\nhttps://www.fool.com/earnings/call-transcripts/2023/05/04/american-water-works-company-inc-awk-q1-2023-earn/\nAmerican Water Works Company, Inc. (AWK) CEO Susan Story on Q1 2023 Results - Earnings Call Transcript\nAmerican Water Works Company, Inc. (NYSE:AWK) Q1 2023 Earnings Conference Call May",
"transaction-id" : "bXZnanQ-2cb9169d98fd4a37b29c030696e8f7ea",
"tokencount" : "28",
"generated_token" : "200",
"ts" : "1704752125522",
"advisoryId" : ""
}

Let’s also store the company information to Postgres. We could have done Kudu, Iceberg, HBase, MongoDB, MariaDB, Oracle, SQL Server, S3 or others. We could have also stored them to all of those simultaneously if you need to.

Output to Slack

RESOURCES

--

--

Tim Spann
Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/