Prototyping Speech Recognition in Framer.js

Brian Bailey
Framer
Published in
3 min readAug 3, 2016

--

Project available on GitHub.

What you’ll learn

  • How to connect to the Web Speech API
  • How to access your device’s audio input
  • How to access your device’s voice synthesizer

What you’ll need

  • The sample code
  • Framer Studio (or Framer.js and a text editor)
  • Basic knowledge of HTML, CSS, and Javascript (CoffeeScript)
  • Chrome Stable >33

Overview

The Web Speech API provides web apps the ability to recognize voices, transform the audio input into strings, and control the synthesis voices available on the device.

Its two parts, SpeechRecognition (Asynchronous Speech Recognition) and SpeechSynthesis (Text-to-Speech) allow designers to prototype speech-based conversational UIs like Google Now, Apple’s Siri, and Amazon Alexa.

The Web Speech API is flagged as an experimental feature in Chrome and Firefox, and is supported in Chrome Stable 33 and greater.

SpeechRecognition Prototype

tl;dr This prototype will not function in Framer Studio.

You can interact with the sample prototype of Google’s iOS app — using Chrome — or clone this repo. Your browser may request permission to use the microphone.

Framer Studio, the official coding environment of Framer.js, is a Safari browser application, which doesn’t fully support the SpeechRecoginition interface of this experimental API. (Safari supports the SpeechSynthesis interface, however.) Framer Studio will likely give the error below and you may not be able to interact with your prototype’s preview in the IDE.

TypeError: undefined is not a constructor (evaluating 'new SpeechRecognition')
This prototype will not function in Framer Studio.

To get around this, we’ll run python -m SimpleHTTPServer [port] in the directory of the prototype's index.html file and use Chrome 33 or greater when interacting with our prototypes. (SpeechRecognition doesn't trigger the microphone in Framer Studio's-generated server.)

  1. Open Terminal
  2. cd into speech-recognition.framer
  3. Type: python -m SimpleHTTPServer 8090
  4. In Chrome, navigate to http://127.0.0.1:8090/

This will now show the prototype in the current working directory.

Create a server and open the prototype in Chrome for full functionality.

SpeechRecognition Interface

The SpeechRecognition interface allows us to recognize speech and respond accordingly. PromptWorks’ piece on Speech Recognition in the Browser provided the snippet below as JavaScript, which I converted to CoffeeScript (and then Framer.js) with js2coffee.

You can paste this in Framer Studio and open it with Chrome.

Your browser may request permission to use the microphone.

# This API is currently prefixed in Chrome
SpeechRecognition = window.SpeechRecognition or window.webkitSpeechRecognition
# Create a new recognizer
recognizer = new SpeechRecognition
# Start producing results before the person has finished speaking
recognizer.interimResults = true
# Set the language of the recognizer
recognizer.lang = 'en-US'
# Define a callback to process results
recognizer.onresult = (event) ->
result = event.results[event.resultIndex]
if result.isFinal
print result[0].transcript
else
print result[0].transcript
return
# Start listening...
recognizer.start()

Now we can do any number of things with the audio, which is now a string. For example, you can pass the output as HTML to a layer.

textBox = new Layer
backgroundColor: "none"
color: "#969696"
html: "Speak now"
textBox.style =
"fontSize" : "50px"
"fontWeight" : "300"
"textAlign" : "left"
"fontFamily": "Arial"
recognizer.onresult = (event) ->
result = event.results[event.resultIndex]
if result.isFinal
textBox.html = result[0].transcript
else
textBox.html = result[0].transcript
return

SpeechSynthesis Interface

The SpeechSynthesis interface provides controls and methods for the synthesis voices available on the device. Browser compatibility is better with this interface, with support both in Safari and on several mobile browsers.

Snippets from PromptWorks.

speechSynthesis.speak new SpeechSynthesisUtterance('Hello world.')

Incrementing utterance.voice = voices[1] should allow you to cycle through your device's synthesis voices.

voices = speechSynthesis.getVoices()
utterance = new SpeechSynthesisUtterance('Hello world.')
utterance.voice = voices[1]
speechSynthesis.speak utterance

References

--

--

Brian Bailey
Framer

(@nest + @pewresearch) alum + sometimes cyclist