Hi there, nice article indeed.
Matt Szaszko

Yes currently it seems like the common denominator is simple text and images. To start off with this is quite good because you can test and train the NLP model and make sure you have a conversation that can run even in SMS :-). We alway ask our customers to do this first.

Then as you pointed out different messaging platform have rich media capabilities. At
Converse.ai we use the chatflow (where you get the ability model the conversation no coding involved) feature to overcome this issue.
Our angle on this is after you build the generic text based conversation,
and using the specific modules you can overwrite the specific part of the bot conversation to contain 
rich media. e.g if the text based conversation asks the user to enter “yes” or “no” in Facebook you can overwrite that specific answer to be 2 Facebook buttons. This module will only get trigger when the user is communicating via messenger

Think of it as MVC (Modal View controller) approach taken to conversation, 
where View is the different messaging platforms.

At Converse.AI we currently have rich media support for Smooch, Facebook and Slack is coming very soon.

