Augmenting and Enriching LLM with Real-Time context

Tim Spann

Published in

Cloudera

6 min readJan 8, 2024

Adding Real-time streaming data to Generative AI workflows at any scale, anytime, anywhere

Augmented Generate Artificial Intelligent Data Reality

Refactoring Real-Time Code For Price, Speed, Functionality and Performance

I switched from Alphavantage to Finnhub for my corporate name lookup as they have a better free program. This one does require some more complex processing, but it’s worth it since I may need to make a lot of calls for people’s requests. Such as:

Q: What is the outlook for IBM this year?

Alpha Vantage API Documentation

API Documentation for Alpha Vantage. Alpha Vantage offers free JSON APIs for realtime and historical stock market data…

www.alphavantage.co

Finnhub - Free realtime APIs for stock, forex and cryptocurrency.

Finnhub - Free APIs for realtime stock, forex, and cryptocurrency. Company fundamentals, Economic data, and Alternative…

finnhub.io

https://finnhub.io/api/v1/search?q=${companyName:urlEncode()}&token=GetYourFreeCode

Jolt Transform

[
  {
    "operation": "shift",
    "spec": {
      "result": {
        "*": {
          "description": "[&1].description",
          "displaySymbol": "[&1].displaySymbol",
          "symbol": "[&1].symbol",
          "type": "[&1].type"
        }
      }
    }
  }
]

Append Stock Information

Stock Value for ${companyName} [${nlp_org_1}/${stockSymbol}] on ${date} 
is ${closeStockValue}.  stock date ${stockdateTime}.  
stock exchange ${exchange}

Use Python (In Next Update We will create a custom Python Processor)

/opt/demo/runcompany.sh

FLaNK-EdgeAI/runcompany.sh at main · tspannhw/FLaNK-EdgeAI

FLaNK-EdgeAI. Contribute to tspannhw/FLaNK-EdgeAI development by creating an account on GitHub.

github.com

So we are using a couple of libraries , some NLP and some Python. This code is based on https://stackoverflow.com/questions/72986264/company-name-extraction-with-bert-base-ner-easy-way-to-know-which-words-relate

We are using SPaCY with a solid pre-built model to extract the company name from the Slack request. This has proven much stronger than the model I am using with the Apache OpenNLP model.

Model: xlm-roberta-large-finetuned-conll03-english

xlm-roberta-large-finetuned-conll03-english · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

I am going to look at using a NiFi Python Processor and also at calling HuggingFace via REST to see which is better and to give everyone options.

from transformers import pipeline
from subprocess import list2cmdline
import spacy
from spacy.matcher import Matcher
import time
import argparse

parser = argparse.ArgumentParser(description='CompanyNameParser')

# parameter
parser.add_argument('--input', type=str, default='Question:  How is Walmart doing?"', help='string to parse')

args = parser.parse_args()

start = time.time()
nlp = spacy.load('en_core_web_sm')
model_checkpoint = "xlm-roberta-large-finetuned-conll03-english"
token_classifier = pipeline(
    "token-classification", model=model_checkpoint, aggregation_strategy="simple"
)

# Organisation names extraction
def org_name(extracted_text):
    classifier = token_classifier(extracted_text)
    # Get the list of dictionary with key value pair "entity":'ORG'
    values = [item for item in classifier if item["entity_group"] == "ORG"]
    # Get the list of dictionary with key value pair "entity":'ORG'
    res = [sub['word'] for sub in values]
    final1 = list(set(res))  # Remove duplicates
    final = list(filter(None, final1)) # Remove empty strings
    print(final[0])


#org_name("Q: What is the outlook for Fedex this year?")
org_name(args.input)
end = time.time()

#print("The time of execution of above program is :", round((end - start), 2))

Clean and Enrich Company Information

${companyName:trim():ifElse(${companyName},${companyName2:trim()})}

Add Company Information

${generated_text:append(
    ${companyInfo:replaceAll('\{\"displayStock\"\:\"',' '):
          replaceAll('\"\}',' ')})}

Cache That Company Information

Retrieve Company Information

Query Record SQL

SELECT description as companyName, symbol 
 FROM FLOWFILE 
 WHERE type like '%Common%Stock%' 
 AND symbol not LIKE '\d'
 AND symbol not like '\.'
 LIMIT 1

Send Enriched AI + Real-Time Stock Information

${generated_text:substringAfter('ANSWER:'):replaceAll('\)','')}

Example Question & Response


Timothy J Spann
Q: What is the outlook for American Water this year?

timchat
American Water Works Company, Inc. (AWK Q1 2023 Earnings Call Transcript
American Water Works Company, Inc. (NYSE:AWK Q1 2023 Earnings Call dated May. 04, 2023. Corporate Participants: Brian Chin — Senior Vice President and Chief Financial Officer.
https://www.fool.com/earnings/call-transcripts/2023/05/04/american-water-works-company-inc-awk-q1-2023-earn/
American Water Works Company, Inc. (AWK CEO Susan Story on Q1 2023 Results - Earnings Call Transcript
American Water Works Company, Inc. (NYSE:AWK Q1 2023 Earnings Conference Call May  
Stock Value for AMERICAN STATES WATER CO [/AWR] on Sat, 06 Jan 2024 03:26:05 GMT is 77.54000.  
jastock date 2024/01/05 15:59:00.  stock exchange NYSE

Output JSON

{
  "date" : "Mon, 08 Jan 2024 22:15:25 GMT",
  "x-global-transaction-id" : "c44ad66a56c4e2cb853480d91116bae1",
  "x-request-id" : "9742864c-1dd6-4c20-8e01-3082aad3177e",
  "cf-ray" : "8427cc5dfa1e42e0-EWR",
  "inputs" : "Q: What is the outlook for American Water this year?",
  "created_at" : "2024-01-08T22:15:25.433Z",
  "stop_reason" : "max_tokens",
  "x-correlation-id" : "bXZnanQ-2cb9169d98fd4a37b29c030696e8f7ea",
  "x-proxy-upstream-service-time" : "2100",
  "message_id" : "disclaimer_warning",
  "model_id" : "meta-llama/llama-2-70b-chat",
  "invokehttp.request.duration" : "8137",
  "message" : "This model is a Non-IBM Product governed by a third-party license that may impose use restrictions and other obligations. By using this model you agree to its terms as identified in the following URL. URL: https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx",
  "uuid" : "dfedbad8-df0c-4572-b47c-9eb07cf099a1",
  "generated_text" : "American Water Works Company, Inc. (AWK) Q1 2023 Earnings Call Transcript\nAmerican Water Works Company, Inc. (NYSE:AWK) Q1 2023 Earnings Call dated May. 04, 2023. Corporate Participants: Brian Chin — Senior Vice President and Chief Financial Officer.\nhttps://www.fool.com/earnings/call-transcripts/2023/05/04/american-water-works-company-inc-awk-q1-2023-earn/\nAmerican Water Works Company, Inc. (AWK) CEO Susan Story on Q1 2023 Results - Earnings Call Transcript\nAmerican Water Works Company, Inc. (NYSE:AWK) Q1 2023 Earnings Conference Call May",
  "transaction-id" : "bXZnanQ-2cb9169d98fd4a37b29c030696e8f7ea",
  "tokencount" : "28",
  "generated_token" : "200",
  "ts" : "1704752125522",
  "advisoryId" : ""
}

Let’s also store the company information to Postgres. We could have done Kudu, Iceberg, HBase, MongoDB, MariaDB, Oracle, SQL Server, S3 or others. We could have also stored them to all of those simultaneously if you need to.

Output to Slack

RESOURCES

NiFi 1.7+ - XML Reader/Writer and ForkRecord processor

Starting with NiFi 1.7.0 and thanks to the work done by Johannes Peter on NIFI-4185 and NIFI-5113, it's now possible to…

pierrevillard.com

GitHub - willie-engelbrecht/ParseMultiLevelJSON-NiFiRecordProcessors: How to parse multi level JSON…

How to parse multi level JSON with NiFI and Avro using Record Processors - GitHub …

github.com

Building an Effective NiFi Flow — QueryRecord

Of the 400+ Processors that are now available in Apache NiFi, QueryRecord is perhaps my favorite. In most cases, it is…

medium.com

NiFi - Split a record using a non-root JSON attribute

I have JSON input of the following format: { "Id": 1000000, "ReportName": TestReport, "Results": [{ "Id": 1…

community.cloudera.com

Building an Effective NiFi Flow — QueryRecord

Of the 400+ Processors that are now available in Apache NiFi, QueryRecord is perhaps my favorite. In most cases, it is…

medium.com

GitHub - tspannhw/FLaNK-EdgeAI: FLaNK-EdgeAI

FLaNK-EdgeAI. Contribute to tspannhw/FLaNK-EdgeAI development by creating an account on GitHub.

github.com

Augmenting and Enriching LLM with Real-Time context

Alpha Vantage API Documentation

API Documentation for Alpha Vantage. Alpha Vantage offers free JSON APIs for realtime and historical stock market data…

Finnhub - Free realtime APIs for stock, forex and cryptocurrency.

Finnhub - Free APIs for realtime stock, forex, and cryptocurrency. Company fundamentals, Economic data, and Alternative…

Jolt Transform

Append Stock Information

Use Python (In Next Update We will create a custom Python Processor)

FLaNK-EdgeAI/runcompany.sh at main · tspannhw/FLaNK-EdgeAI

FLaNK-EdgeAI. Contribute to tspannhw/FLaNK-EdgeAI development by creating an account on GitHub.

xlm-roberta-large-finetuned-conll03-english · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Clean and Enrich Company Information

Add Company Information

Cache That Company Information

Retrieve Company Information

Query Record SQL

Example Question & Response

Output JSON

Output to Slack

RESOURCES

NiFi 1.7+ - XML Reader/Writer and ForkRecord processor

Starting with NiFi 1.7.0 and thanks to the work done by Johannes Peter on NIFI-4185 and NIFI-5113, it's now possible to…

GitHub - willie-engelbrecht/ParseMultiLevelJSON-NiFiRecordProcessors: How to parse multi level JSON…

How to parse multi level JSON with NiFI and Avro using Record Processors - GitHub …

Building an Effective NiFi Flow — QueryRecord

Of the 400+ Processors that are now available in Apache NiFi, QueryRecord is perhaps my favorite. In most cases, it is…

NiFi - Split a record using a non-root JSON attribute

I have JSON input of the following format: { "Id": 1000000, "ReportName": TestReport, "Results": [{ "Id": 1…

Building an Effective NiFi Flow — QueryRecord

Of the 400+ Processors that are now available in Apache NiFi, QueryRecord is perhaps my favorite. In most cases, it is…

GitHub - tspannhw/FLaNK-EdgeAI: FLaNK-EdgeAI

FLaNK-EdgeAI. Contribute to tspannhw/FLaNK-EdgeAI development by creating an account on GitHub.

Written by Tim Spann