Actually, unless your phone is from 2008 or earlier, it has plenty of processing power to convert your speech to text and then do something intelligent with that text, like fetch data or pass a command to an app (as Siri and Alexa do). Pentium 4 PCs from the early 2000s were capable of doing this (using software sold by IBM or Dragon Systems [now Nuance]), and modern smartphones are much more capable than those machines. The CMU Sphinx speech recognition engine now runs on Android smartwatches. http://cmusphinx.sourceforge.net
Alexa, Siri, and the use our phones as glorified tape recorders and do processing in the cloud primarily for business reasons: that approach gives them access to all our utterances (the better to profile us with) and, more important, they retain control over how requests are fulfilled, which is the key to monetization.