Even then, you have local voice recognition. You don’t need to stream all microphone recordings to some central server for processing, you just do voice recognition and keep a log of say the last 100 nouns and a high priority log for the last twenty nouns used near verbs like purchase, buy or get. Then send those lists to the ad provider as context. All the hard work is done on the client device and the same backend used for ad context on web pages can be used for this as well.
Then hide it encrypted in an image upload or some other packet. Listen for ‘buy a <something>’ encrypt its text version, wait for something to cargo it with in a data transmission so people looking at data transmissions aren’t any the wiser, hide it in some obscure way that would look normal otherwise, it’s intercepted, sends off to advertisers. Adtech is cyber terrorism.