Syllabification is a Fun Word: Does English Use More One-Syllable Words than Spanish?

Robin Pollak
3 min readAug 27, 2016

--

Last week I spoke Spanish for the first time in a while. I couldn’t help but notice that the rhythm of my speech felt markedly different than what I’m used to while speaking English. After mulling it over for a while I began to wonder: Does English have more one-syllable words than Spanish?

Gather Some Evidence

I had initially planned on comparing English, Spanish, French, German, and Chinese, but decided to only include the first 2 due to lack of open source algorithm. My next step was to find some linguistic rules about how to split words into their syllables, and I eventually settled on these rules for English and these ones for Spanish. Next I had to put together some code that would follow these rules and syllabify inputted words correctly — neither algorithm is perfect, but their flaws are mostly due to syllabification being an inexact science rather than a failure to follow the rules I had set out for myself. Finally, I had to find a reference source for the words I was going to test. I decided to test the 20,000 most frequently used words in both languages. 20,000–35,000 words is considered the average adult vocabulary size in English, but since Spanish is thought to be a smaller language I compromised to 20,000. Luckily, I was able to find good lists for both English and Spanish.

Get Results

My raw results can be found here, and the data I collected is rendered in a the below chart:

As you can see, my hypothesis is confirmed. Nice! Perhaps even more interesting, though, is that the data in the first four columns is almost exactly mirrored between columns 2 and 3. This means that the English uses one- and two-syllable words with nearly the same frequency, 24.84% and 38.98%, as Spanish uses four- and three-syllable words, 24.43% and 41.76%. In fact, the average number of syllables in the Spanish words tested is almost a whole syllable (0.86) more than the average of the English words tested.

Identify Possible Error

There are several flaws in the data I’ve collected here. First, as mentioned above, my algorithms to syllabify words are imperfect. There may be bugs in my code as well as words that aren’t correctly split by the rules I used. Second, the lists of words I used cannot be verified to actually be the 20,000 most commonly used words in these languages. The methodology of collecting these lists could be flawed or the lists could simply be outdated. Finally, in the list of English words there were a bunch of “non-words” such as all of the letters of the alphabet and acronyms like “rgb” that calculate out to zero syllables according to my algorithm. In order to combat that, I’ve simply not included “zero-syllable” words in my calculations. This brings my total number of words counted in English down to 19,184, and I’ve correspondingly capped the number of Spanish words processed at 19,184.

Conclusion

In the course of my daily life I was prompted to wonder something about the differences between two languages that I speak. I was then able to harness the power of programming to explore my question and come to a conclusion that I’m satisfied with. Through the process I not only answered by question but also had fun and expanded my programming ability! All my code can be found on my Github.

--

--