Searching Slack from Apache NiFi

Tim Spann
Cloudera
Published in
4 min readApr 26, 2024

Open Source Slack Search Engine Integration

I was thinking of how to get information from my Slack channels. I was thinking I can load it as it is add to a vector store. Then someone at work mentioned Slack has a Search feature. I thought they probably have an API for that. They do and it’s available if you build an app. Awesome, let’s try it.

First if you don’t have an app created you will need to create a Slack app.

Quick Walk Through

For Slack Details — Related Articles

NiFi Flow

So this is another consumer of our slack messages -> kafka topic. We call the Slack API via InvokeHTTP and then split and parse the results from the complex JSON.

SEARCH: Lafayette City Center

Search Results from Timothy Spann — Lafayette City Center, from askflankbot with Search Score 26.653801
==> HuggingFace Mixtral 8x7B Results on Thu, 18 Apr 2024 18:26:13 GMT:
<s>[INST]Write a detailed complete response that appropriately answers the request.[/INST][INST]Use this information to enhance your answer: [/INST] User: What buses are near Lafayette City Center, Boston, MA</s>
at

Parse, Split, Limit Results and then send to Slack and then to Kafka

For starting our search

https://slack.com/api/search.messages?query=${searchterm:trim():urlEncode()}&count=9000&pretty=1

We have to send out a header with Authorization with a value of Bearer <and our User Token from App>.

We split out the JSON Arrays into Individual Records

$.messages.matches

$.blocks

$.elements.*.elements

I also extract out the high-level fields.

The final QueryRecord to limit links and text.

SELECT * FROM FLOWFILE
WHERE type = ‘link’
LIMIT 20

SELECT * FROM FLOWFILE
WHERE type = ‘text’
AND CHAR_LENGTH(text) > 150
LIMIT 3

Kafka Results

SLACK OUTPUT

SOURCE CODE

Slack Template

Search Results from ${messagerealname} - ${searchterm}  from ${searchchannelname} with Search Score ${searchscore}
==> ${searchresulttext}
No Reactions: ${searchnoreactions}
Permalink to Result: ${searchpermalink}
Search Team: ${searchteam} Posted by ${searchusername}
========= Dates: ${date} TS: ${searchts} KT: ${kafka.timestamp} ID: ${searchiid}

RESOURCES

Join me at upcoming meetups and events:

May 8, 2024: Boston:

--

--

Tim Spann
Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/