AI APIs: What are they and how to use them
A set of better practices for Artificial Intelligence APIs
Are you curious about speech-to-text, language translation, or image recognition APIs? You’re in luck! There are plenty of services that allow you to do this, and also, plenty of reasons to use Artificial Intelligence (AI) APIs in your applications. In this article, we’ll describe some of the use cases for AI APIs, and then talk about the better practices to adopt when using them.
If you’re in a rush, rush to the TL;DR at the bottom.
The first question before implementing an external service you need to ask yourself is “do I need this”?
When it comes to Artificial Intelligence-based APIs the answer can get complicated! To help you decide, let’s have a look at two of the most commonly-used services, and when you should (or shouldn’t) use them.
Use case #1: Speech-to-Text
Many applications today are leveraging Speech-to-Text capabilities — and you’ve probably already used one that does. Siri, Google Assistant, Bixby or Alexa are all using it, but that’s not all. It is also used by messaging apps (e.g. WhatsApp) and search engines (e.g. Google’s search-bar).
You may wonder where is the AI in there. Isn’t that just speech transcription?
The base of any speech-to-text API is indeed to take speech audio as an input and transcribe it to text, though it won’t stop here, and this is when the magic of artificial intelligence happens. For most of the APIs available out there, it will also:
- alter the previous transcription based on context
- identify different speakers
- be able to look for specific keywords
- allow for model personalization (useful for regional accents)
Use this: for specific use-cases, such as during speech synthesis during a trial where the context, words used and the current speaker matters a lot.
Don’t use this: if you’re only expecting a single speaker or for simple text-messaging. In that case, most smartphone keyboards already include a speech recognition capability that can be used and for web application you can use the SpeechSynthesis API.
Use case #2: Image Recognition
Compared to Speech-to-Text, there are only a few applications that are using Image Recognition nowadays, and most of them are quite specific (e.g. PlantNet that identifies the plants you’re taking a picture of). Though, we are starting to see some general use of it, specifically with:
- Google Lens: analyses what is on your screen (picture or not) and will try to identify addresses, places, things, and recommend results based on it
- Seeing AI: application for visually impaired people that describes what’s around them (based on where the phone’s camera is pointing)
As you’ll have understood by now, the magic here is to take an image as an input and try to identify what is in the said picture. For most providers, there are two ways to do so:
- Using pre-trained models: these includes common classes (food, places, people, color…)
- Using custom classifiers: these allow for the users to train their own classes (as seen with PlantNet above)
Though some Android and iOS devices have image recognition capabilities built-in, as a developer, you can’t always query them from your app. Therefore as long as your use-case fits, then you should use it.
Use case #3, #4…
There are plenty of other AI APIs out there. We’re not going to go through them all in this post, but if you’re thinking of using one of them, and are not sure whether your use-case fits the need or not, leave a comment or contact me directly. I’ll be happy to help!
You have now decided to use one, two, or even more (in that case, here’s an article for you) AI APIs in your application. In this section we’ll see a few better practices to adopt before thinking of using them in production.
Try before you buy
As with every third-party API, there are parameters to take into account — price, usability, availability… but when it comes to Artificial Intelligence there is also a reliability factor that comes into play.Reliability is not a binary consideration, and won’t rate 0 or 1. In fact it will be anything in between 0 and 1 as we’ll see in the next section!
All major cloud providers provide their own set of AI APIs, and they all have either a demonstration page or a free trial. Take advantage of it, try for yourself, and choose the one that gets the best results!
Artificial intelligence rhymes with confidence
As talked about in the previous section, every time you’ll get a response from an AI API, it will also contain a confidence level ranging from 0 to 1 which corresponds to the percentage of certainty it has recognised something.
Let’s take an example with a Visual Recognition API, when sending the following image:
Here’s the JSON response. I’ve removed some classes for readability.
"class": "mechanical device",
"class": "Indian red color",
"display": "General Model",
"description": "Quickly understand objects, actions, scenes, and colors within an image."
In this case, we have a class that is identified along with a confidence score. Taking that score into account is key. I usually recommend a minimum confidence score of 95%. This score should go up over time as you or your API provider add(s) more training data to the models.
Speaking of training data, another thing you will want to consider is whether or not you want to allow the data sent from your application to be used to improve the general models of your provider. If you do want to allow it, you need to let your users know explicitly.
- Before using an AI API, decide if you really need it
- If you do, try different providers and select the one that fits your use-case best
- Make sure to have a threshold high enough on confidence score to avoid false positives
Have fun infusing AI into your apps!