How to Teach Your Virtual Assistant English Spelling

Building a converter to the International Phonetic Alphabet with Python

Natalia Kuzminykh
Geek Culture
4 min readAug 16, 2022

--

Photo by Edho Pratama on Unsplash

The spelling of such languages as English is well-known for its inconsistency. Thinking for a moment about word pairs such as “do” and “no”, “though” and “thought”, “plough” and “rough”, “cough” and “trough”, should quickly convince most speakers that there are no concrete rules about how to go from a word spelt out and its pronunciation. Not only that, but we have such anomalies as the words “coronel” and “boatswain — pronounced [kɝnl̩] and [boʊsn̩], respectively. However, even though the pronunciations of some highly frequent words may be quite irregular, most people tend to agree on the pronunciation of new or unseen words.

Why does the VUI need to know English spelling?

Imagine the following scenario: you are building a system that directly interacts with humans, such as a virtual assistant (like Alexa or Siri).

How would you teach it to understand and what is more important to produce the difference between sounds, accents or other language varieties? It is a challenging task, right?

With the help of IPA, you could efficiently solve this issue. By learning the standardized notations, your speech recognition algorithm will be able to easily produce the difference in “though” and “thought”, “plough” and “rough”, “cough” and “trough” by itself and sounds natural.

Wait a minute, but what exactly is IPA?

Well, I am glad you are asking! If it happened that English is your second language, then you probably came across IPA notations during your English pronunciation classes.

International Phonetic Alphabet (or IPA) was designed as a standardized alphabetic system of speech sounds based on the Latin script.

However, using the phonetic alphabet for the purpose of language learning is a relatively recent development. Until the end of the 20th century hardly anybody knew what it was apart from linguists.

Then, as the phonetic alphabet developed and shaped, it began attracting more and more attention: from actors working on accents for their roles, opera singers, foreign language students and of course conversational designers for developing virtual assistants.

The Coding Part

Photo by Artem Sapegin on Unsplash

Now that we have defined the main concepts, we are ready to move on to the code itself. Low-code platforms for creating Virtual Assistants and Chatbots simplify the exhausting development process and provide you with a user-friendly UI system that you can enrich with your data.

This tutorial could be handy if you want to convert your set of words quickly to IPA notations.

Let’s start by assuming you have a file with a bunch of words you need to convert:

0 abominable
1 abutting
2 accede
3 accentuate
4 accumulated
… …
1104 wrangler
1105 wreaking
1106 wringing
1107 xenophobia
1108 zoology

Note that for convenience, only a short version is given here; you can find a complete list of English words here.

A Python package English-to-IPA is a powerful tool which smoothly converts a provided set of words from one notation to another. Execute the following command in your terminal to install the dependencies:

pip install eng-to-ipa

Afterwards, you can import the package and run this script to rewrite your set of English words with their IPA. Note that your file should be in txt or csv format.

#Don't forget to import the package
import eng_to_ipa as ipa
# Change the name and the extension of your file
with open('YourDataset.txt','r') as file:

# reading each line
for line in file:

# reading each word
for word in line.split():

# displaying the words
print(ipa.convert(word))

Once the script is executed, you will find a txt file with a list of IPA charts which can be further used to teach your VUIs with English spelling. Below is an IPA transcription for the words from our example; you can find the full list here.

0 əˈbɑmənəbəl
1 əˈbətɪŋ
2 ækˈsid
3 ækˈsɛnʧueɪt
4 əˈkjumjəˌleɪtɪd
… …
1104 ˈræŋgələr
1105 ˈrikɪŋ
1106 ˈrɪŋɪŋ
1107 ˌzɛnəˈfoʊbiə
1108 zoʊˈɑləʤi

If you enjoy reading stories like these and want to support me as a writer, consider signing up to become a Medium member. It’s $5 a month, giving you unlimited access to stories on Medium. If you sign up using my link, I’ll earn a small commission.

You also can support my sleepless nights when I am creating the content by buying me a coffee.

Related articles:

--

--

Natalia Kuzminykh
Geek Culture

NLP Developer & Conversational AI | A linguist from Italy who is learning to navigate passion to technologies