Your first speculation seems highly unlikely. The processing power to parse audio to text would be noticeable in both battery and CPU usage and is generally really expensive unless you are looking for a very particular phrase or pattern. The second scenario is more probable, that is, the phone records audio samples and uploads them when on WiFi and/or the phone is charging.
However you have not proven that Instagram is at fault here; Ads are generally served from ad providers like Facebook, Google, etc. The app simply requests an ad and the provider gives you the ad based on their data collection (from third party cookies, to your linked social media accounts, etc). So it is likely that the ad partner that Instagram uses has somehow gotten access to audio conversation.
If you have in fact confirmed this is happening with a particular device, you should inspect what apps have permission to the microphone. On iOS 10, find this via the Settings > Privacy > Microphone.
I am interested in which ad providers may be using this data and who is collecting (as in, what apps). If it truly is Instagram, is it their doing or the ad provider’s SDK? Is the operating system itself doing this (therefore being Apple’s doing).
Lesson of the day, do not blindly accept permissions and generally revoke permissions that an app ought not to need.