Gemini Pro 1.5 multi-modal Slack bot

Taneli Leppä
Google Cloud - Community
2 min readApr 10, 2024

In my previous article, I showed how to create a Slack bot that uses Gemini 1.0 Pro Vision LLM to provide users with interactive multi-modal discussion capabilities on Slack.

Since Google Cloud Next 2024 just kicked off, we got a nice surprise by having public access to the Gemini 1.5 Pro model. It has a bunch of really cool new capabilities (such as audio) and improved performance. I wanted to update the bot to the new version, which turned out to be relatively straightforward.

And as always, you can deploy the bot into your own Slack workspace (see the instructions in the previous article and make sure you have the latest code from the repository).

The bot can be also integrated with Vertex AI Search (and other REST APIs) to give the model direct access into topics specific to your business.

So let’s explore some of new capabilities that Gemini 1.5 can give to your Slack bot:

Basic context-aware answers

The questions that bother us.

Integration with Vertex Search

This question calls an API function that calls Vertex Search, see next image.
The results returned back to Gemini API.

Working with PDFs

Providing a PDF (in this case, a monitor manual) and asking a question from it.

Answers using images

Had to use this image.

Answers using voice

Last but not the least (and for the coolest Gemini 1.5 capability!), we can also just ask Gemini directly using a voice message on Slack:

Nice!

That’s it. Let me know if you deploy a Slack bot using my code or if you take inspiration from it. You can reach me for example of X: @rosmo or just drop a comment.

--

--