I think what he wanted to express is, that the traditional Interfaces of all the apps, websites and so on… will be dispensable because you can get the same or even better service in already learned Apps like Messenger.
Your argument regarding images is totally true, but so is the fact that a huge number of texts is visually scanned and not read. I think a good conclusion would be short sentences and phrases + visual modules (like the cards) for interactions and “results”.
Later this week I tried the first version of the ebay shopping bot. The exact opposite happened. I really liked the experience of getting results without having to click through the website und scroll/scan all the content.
And if bots get more intelligent, the results and answers to you concern will produce even better value than a “real” sales person.