recognizing is so time consuming
why not have a recognizer that doesn't recognize at all,
but fakes results from a given transcription,
just consumes frames from the frontend
and calls event listeners and monitors as needed
When the floor changes, react: br>
If floor is taken, be quiet br>
If floor is available, speak queued IUs' utterance (from file if one is available, via TTS otherwise). br>