I loved the article, but you oversimplify our inner thought process and even reading by treating it as word flow. We do not just translate to sounds, we translate to the connection set represented by words/pictures/groups of words.
In fact this difference is once of the reasons that audio (one way or two way) communications interfaces are sometimes a very bad idea (and why wordless audio can often be a brilliant one). Essentially, they are slow and force you to wait through meaningless extra stuff, or produce lots of meaningless extra stuff. They slow us down and narrow our ability to consume what we want in a directed (by us).
When I read the article I flitted around, it was not a straight linear process. And that is the power of visual over audio as a medium. Similarly, tactile is a powerful expression medium and we are so good at doing things with our hands that using our voice is often a poor substitute. Want to change the channel, press a button, want to pick a channel from a list again, button. Saying “TV, next channel” is just insane. Sure, saying “TV, mute” is great when the controller is not in hand, but blindly pursuing conversational models is a bit silly (even if it is the in thing).
As I said, great article!!!!