Azure OpenAI in IoT device (your personalised chatbot)

7 min readAug 30, 2023

Created by Microsoft Bing Image Creator. Prompt: Create a colourful image of *Personal AI Assistant*

Can OpenAI solve all the problems? Maybe yes but may not be with GPT3.5 or GPT4. Maybe it will happen with GPT5+ till then let’s see how we can bring additional services or software together with OpenAI to build an end-to-end solution.

What?

Do you like the image? I am still trying to make a 3D printing of Raspberry Pi case which will look like this. Because I am so excited to build my own personal AI assistant for my home, with whom I can speak the way I want and it should respond on the context that I have configured, my kid can ask to tell a story and it should tell a story according to their age or can even ask to tell a story where the kid’s name can be a character and so on. Now this became really easy with recent launch of OpenAI ChatGPT.

Why?

Well, ChatGPT Mobile App is there now for Android based phones in some countries. So you can put any prompt and it will answer. But it cannot tell you latest weather alerts or rain probabilities or latest news headlines or cannot find a product in your favourite e-commerce stores and compare them across. There are many things you would imagine your personal AI assistant should do. Or you may not want your kids to use Mobile a lot even though they possible using a mobile app.

Basically I am talking about three potential problems:

Binging latest context to OpenAI, that could be news for a topic or latest weather for a city and so on.
A need of an IoT device, where the solution can run (like Amazon Alexa, Google Home, etc.) which is accessible by kids.
Can a single OpenAI Prompt handle all of the skills (news update, storytelling, weather check etc.)? Perhaps not, hence you might have to create multiple prompts that can handle different tasks. Additionally proper experiments and evaluations are also needed to make sure that it can perform well across multiple scenarios.

Demo

Before I start going deeper into the above mentioned problems and how they can be solved. It’s time for a quick demo.

How?

Can OpenAI solve all the problems? Maybe yes but may not be with GPT3.5 or GPT4. Maybe it will happen with GPT5+ till then let’s see how we can bring additional services or software together with OpenAI to build an end-to-end solution.

We will take the problems we have mentioned earlier and start looking into how I have tried to solve them.

Problem 1: Binging latest context to OpenAI

This is a common problem and not going away very soon. The LLM Models are trained with data that were available during that point. For example let’s consider the OpenAI GPT3.5 model version 3 is trained with the data available till 1st June 2023 and then an election happens for a country on 20th June 2023 and a new prime minister got elected named X and the previous prime minister was named Y. Post 20th June if you ask OpenAI about who is the prime minister, it will answer Y.

In this solution I have used following services to address the latest context problem:

Latest news: I have used Microsoft Bing Search Services news API, where the latest news can be fetched easily with several customisations like search market, language, filters etc. — reference code
Latest weather with forecast: I have used OpenWeather, from this not only the latest weather for a given place can be retrieved, but also hourly and daily forecast can be fetched. — reference code
Latest products from my favourite e-commerce sites: Getting access to my favourite e-commerce sites (ex. Amazon, Flipkart, etc.) might be little difficult. But Microsoft Bing Search Services web API can address that problem as well. You can filter the web search result easily just by adding “site:<your favourite provider>” to query parameter while using REST API. Example code is following, for full implementation refer — reference-code

headers = { 'Ocp-Apim-Subscription-Key': subscription_key }
endpoint = endpoint + "/v7.0/search"
q_entities = entities + "site:amazon.in"
params = { 'q': q_entities, 'mkt': mkt }
response = requests.get(endpoint, headers=headers, params=params)

Problem 2: Create the solution for an IoT Device

It’s not a problem and nothing to do with OpenAI, but very much needed so that a solution can run in an IoT Device. Again we need several services or software to enable the solution in IoT. Let’s look at them step by step:

Hardware: First thing we need is the compute and their supporting accessories to enable the solution, I have used the following hardware for building a prototype

Raspberry Pi Board with USB support.
Push or Touch button than be connected to GPIO (General-Purpose Input/Output) pins in Raspberry Pi. Why we need this? This is optional but I just wanted my Bot to listen when I want to (by pressing a button) or want to stop if I don’t want to listen what it’s saying.
A microphone connected with Raspberry Pi. Even USB PnP Sound Device with in-build microphone will work. I have used WaveShare USB Sound Card for this.
A speaker connected with Raspberry Pi.

Sample hardware setup for this prototype

Software or Services: Once the hardware setup is ready we need to take care of the following aspects before or after hitting OpenAI:

Speech to Text: The first problem is converting the speech to text, so that it can be passed to OpenAI. There are several options for that. I have tried whisper.cpp (tinny version) and Azure Cognitive Service Speech to Text, both performed well for English as input language, but for Whisper the latency was little higher than Azure Cognitive Service Speech to Text in my device.
Once I have the text of user query, now it’s time for Azure OpenAI GPT calls. The details I will talk about in the next section. But mainly it will give some answer for the given query.
Text to Speech: Once I have the answer as a text the next problem to be solved is converting to a speech. Azure Cognitive Text to Speech does an amazing job in this. It not only convert the text to a realistic speech, but also you can modify the voice, speaking style, emotional tone, etc.
Deployment: We are talking about IoT, hence we need to think about deployment/update at scale. Azure IoT Hub is the best choice to make it happen with seamless deployment/update/monitor at large scale.

Problem 3: The flow of OpenAI GPT calls

We are now at the climax of the article. As I have talked about why it’s important to bring latest context to OpenAI. Similarly the next problem could be: can OpenAI perform well without the right prompt for a specific scenario? Let me elaborate this problem with an example. Let’s say you want you chat bot to give you a latest weather update where you want to hear the temperature in Degree Celsius or you want the chat bot tell a story for your kid who is 4 years old. Now can you achieve this in a single prompt?

To address this problem I have split the prompts into following:

Intent detection prompt: This will be first prompt that will be called, where it will identify the GPT skill (the following prompts that are created for a specific task) that needs to be invoked.
General greetings prompt: This will be called to greet the user.
Weather check prompt: To get latest weather information based on the city given in the query.
News check prompt: To get the latest news based on the context given in the query.
Searching the product prompt: To find the best product from the favourite sites, by comparing the price, offer, features etc.
General knowledge prompt: This is where I have tried to leverage the knowledge of the GPT3.5 and it’s awesome as you can see in ChatGPT.
Story teller prompt: This prompt does impress my kid by telling a story what they want to hear (they even tried stories with their name and a lion as characters in a story)

Note: LangChain or Azure OpenAI Functions also can solve this problem, I just wanted to try more controlled or basic way.

Complete architecture

What I have talked about so far is presented in this following architecture diagram:

Source Code

The prototype source code can be found in GitHub: prabdeb/openai-iot-speech-chatbot: OpenAI GPT model to build your personal assistant in IoT devices. Just like Alexa, Google Assistant, Siri, etc. but with your own skills, custom voice, and custom personality. (github.com)

Part 2: Experiments and Evaluations for OpenAI based application

How do you think I have come to this conclusion of using different prompts for different skills? Or how did I make sure that the intent detection is really working? Or where GPT is hallucinating?

Data -> more data -> many more data -> experiments -> many more experiments -> evaluate and compare

It’s not different with normal data science projects flow isn’t it! In the part 2 I will talk about the tool that I have used to experiment and evaluate the OpenAI prompts.