Reimagining Shopping Cart Experience with iOS 10 Speech API

Prianka Liz Kariat
YML Innovation Lab
Published in
4 min readAug 24, 2016

iOS 10 has introduced the new Speech framework which allows app developers to incorporate speech recognition in their apps. The exciting fact about this API is that it can perform realtime speech recognition.

We at Y Media Labs Innovation decided to leverage the power of this framework to simplify some existing service used on a daily basis.

After pondering on the problem statement we zeroed in on the concept of letting users filter out products they want to purchase using words spoken by them.

We are going to name our demo EchoPick which will let users filter jeans based on different criteria recognized from words spoken by the user. The best part about the app is that the user does not have to proceed through the cumbersome process of manually changing the filters.

First off we go ahead and create an Xcode project and name it EchoPick.

We also link Speech.framework to our demo project.

Inorder to perform speech recognition using Speech framework it is mandatory that you add NSSpeechRecognitionUsageDescription key to info.plist describing the purpose of recognising the words spoken by the user. You can also go ahead and add a key named Privacy — Microphone Usage Description which will describe how you intend to use the microphone for speech recognition.

App Logic

Our app will have one screen which presents a list of jeans on sale in a UICollectionView. Navigation bar will have a button clicking on which the app starts listening to the user’s voice. Once the user has finished uttering the filters, he can click a button to initiate filtering.

Project OverView

  1. ProductListVC — This is the view controller that displays the jeans on sale, handles all the voice processing and filtering.

Permissions

User’s speech can only be recognized if he explicitly permits the same. We will request for user’s permission as illustrated in the code snippet.

private var authorized: Bool = false

This property is used to enable/disable the voice filter UI based on whether user has granted permissions.

Data Source

We setup a datasource with some products to be filtered.

private var products: [[String : String]]

Each jeans is represented by a dictionary with keys like “fit”, “color”, “size” etc.

Speech Recognition

We have to declare three important variables that are required to perform the task.

We create a recognition request which is used to initialize a recognition task. This recognition task has to be taken up by the SFSpeechRecognizer.

We will also create an AVAudioEngine object. If you are familiar with AVFoundation you will recall that an audio engine is required to process the input audio signals.

When recognition task is initiated by the speech recognizer, Apple servers are contacted to convert the words spoken by user into text. You will immediately get callbacks with the best possible interpretation of the words spoken by the user. But it takes some time ( 1 min approximately) for the server to finalize the result of recognition. If you want intermediate results to be delivered as and when words are recognized, you have to set shoulReportPartialResults property of SFSpeechAudioBufferRecognitionRequest.

We start the recording process once user clicks a button in the interface. Since it takes time to obtain the final results we have provided another button in the interface so that the user can click on it and see the filtered results. We stop the speech to text conversion once this button is clicked as illustrated below.

Once we obtain the final text we make an api.ai call to convert our text into intelligible filters we have preset ( like color, size etc.).

For those unfamiliar with api.ai, it is a conversational platform that will let you understand words in the text based on intents created. It processes natural language to derive parameters meaningful to our context.

After we get the filters spoken by the user, in the response of the api.ai call, we will use them to filter through our product data source and display the results in the collection view.

Thus we are able to solve the problem of manually changing multiple filters every time you want to get to the product matching your requirement using Speech framework.

Let us see our final product in action.

You can check out the full source code for the project here.

References

Developed at Innovation Labs @ Y Media Labs

--

--