Thought Experiment:

Cam Cook
Experiments in Thinking
5 min readSep 19, 2014

--

Imagine Apple’s Siri with a Speech Impediment

Having a speech impediment, like a stutter, stammer, or lisp, is no day at the park. As a software engineer, the ever-more-ubiquitous phone interviews seem to draw an giant arrow underneath my head. It can make me feel like I’m not the engineer that I know I am. Sometimes I can even image the interviewers notes: Can’t communicate. Don’t hire.

Let’s dispel some misunderstandings with a thought experiment. How would you create a Text-to-Speech (TTS) program, like Apple’s Siri, with speech difficulties?

As an Engineer, I want to be able to understand how a speech impediment happens.

Let’s first check Wikipedia.

Speech disorders or speech impediments are a type of communication disorder where ‘normal’ speech is disrupted.

These impediments can be physical, like a malformed larynx, or a failure in the mental process of creating speech. I want to focus on the latter. It’s the kind I have, and it’s the most common type.

I’ve undergone years of speech therapy to get to where I am today. I was born with a congenital defect called Ankyloglossia. Ankyloglossia is caused by a longer, thicker lingual frenulum (the muscle underneath the tongue). Of course, you may have already heard of it by another name, ‘tongue-tied.’ People with the condition have severely decreased tongue mobility, making it very difficult to speak.

Luckily, I had the situation surgically remedied as a toddler, but not before I started speaking. We build the foundations for speech early on. Babies learn to babble until neurons create the connections to make progressively more complex speech. Unfortunately, I babbled without moving my tongue.

There are three types of impediments I want our experiment to cover: the stutter, the stammer and the mumble.

Stutterers get ‘stuck’ on a syllable. They usually have to keep repeating it until it forces itself out. Trust me, its extremely embarrassing because it makes you sound unsure or nervous.

And don’t get me started on words that start with ‘wah’. Words like, Webster, web, and willing, tormented me in elementary school. I would get stuck and have to keep repeating ‘wah’ until I gave up from total frustration. Even more hurtful, I knew that I knew how to pronounce the word. The little voice everyone has in their head says it perfectly fine, but the mouth refuses to cooperate.

Stammering is a lot like stuttering. Stammerers get stuck on words or even entire speech clauses, but instead of repeating, they simply freeze. They don’t speak with a natural flow. When I stammer, a comma could become a five or six second delay. While I admit it’s not as jarring to the listener as a stutter, everyone will still notice.

Mumbling isn’t about getting stuck. Mumblers simply lack elocution. Syllables run into each other completely undifferentiated. Mumbling can make it seem that the mumbler is speaking too fast, but that’s not it. Since the listener has difficulty parsing each word, their mind assumes speed is at fault.

It’s no coincidence that these impediments are ordered this way. Those who coach people with speech issues, Speech/Language Pathologists (SLPs), don’t necessarily try to ‘cure’ the patient of the impediment; they simply turn a noticeable impediment into a less noticeable one. Stutterers become Stammerers who become Mumblers.

Now that we have a basic understanding of speech impediments, let’s discuss our TTS system.

As an Engineer, I want algorithms that define disordered speech chains

Our thought experiment will need algorithms to follow. We want a TTS system that has a dictionary of sorts. Each written word is mapped to a chain of sounds. But since we know English isn’t phonetic, we have to break each word up into syllables.

To begin, lets get some boilerplate out of the way.

https://gist.github.com/Ccook/c9fcbf348d43a9f5b888

We’ll name our TTS system, Vak, the Sanskrit word for ‘speech’ and the personified Hindu goddess.

Vak loads our dictionary of words/syllables, and uses a Speech Strategy to ‘speak’ the syllables. We’ve preloaded our dictionary with my personal favorite, Webster.

As a control, let’s think how a ‘normal’ Speech Strategy would operate. Unaffected speech would ‘speak’ each syllable without hesitation.

https://gist.github.com/Ccook/fe63ad49e7a5c9ae606e

Our Unaffected Speech Strategy simply takes the list of syllables and prints them as a collection. ‘Webster’ becomes ‘wuh-eb-stur.

Now, let’s add some frustration!

https://gist.github.com/Ccook/3523c0de85440557b01d

Our Stuttering Speech Strategy ‘speaks’ each syllable just like the previous, except this time, it chooses a random syllable and will stutter a random number of times. ‘Webster’ could become ‘wuh-wuh-wuh-wuh-eb-stur.’

I’d like to imagine my brain having a erroneous function like stutter(). All the additional code symbolizes the additional work I have to do each time I want to speak.

Refactoring our stutter to a stammer is easy. Instead of choosing a random syllable, we’ll find a random time to (awkwardly) pause.

https://gist.github.com/Ccook/1d6eeb9f419a20298d2e

Instead of deciding if we should stutter a particular syllable, we leave it to chance with the function shouldStammer(). If true, Vak will be forced to pause between 1 and 10 seconds. ‘Webster’ might become “wuh-… … …-eb-stur

Being a proficient mumbler is a mark of considerable achievement. Its hard to believe sometimes that I’m not asked to repeat myself from being completely understood. Instead, I’m asked to ‘Speak Up!’: Asking me to speak up is the best way to make me stutter.

As previously mentioned, mumblers cluster syllables together to minimize stutters and stammers.

https://gist.github.com/Ccook/eda68ddb39012c9b183a

Our Mumble Speech Strategy is as close to our control as we can get. It ‘speaks’ each syllable at the same pace without trouble, but it seldom enunciates (using a ‘-’).

As a friend, how can I help?

I hope this thought experiment has shown you the difficulties others go through on a daily basis. While sympathy and patience is all anyone could ever ask, here are a few pointers for communicating with people who have verbal impediments.

  • Understand that it can be just as frustrating for them to speak as it is for you to listen
  • If you must draw attention to it, choose kind words. NEVER say ‘Speak up!’
  • If she is someone you’ll interact with often, consider having non-verbal cues. For example, when she isn’t speaking clearly, everyone on the team should look down at their feet. It will let her know she may be mumbling without making her nervous.
  • Avoid telephone calls when possible, especially for an interview. I rely heavily on hand motions, facial expressions, and sometimes whiteboards to get a point across.

--

--

Cam Cook
Experiments in Thinking

Musings of a Millennial Software Engineer. (github://ccook)