Notes on a proposal for how to use a MUC along with an SFU for a group media experience with XMPP.
When sending presence to the MUC, the MUC will (after normal join procedures, if the user is not banned, and after they get their self presence) check if the joining full JID supports jingle calls. If not, the MUC may kick the user, or allow them as text-only depending on settings. The MUC bare jid will then start a jingle call offer to the joining full JID. The client needs to be able to handle calls with multiple audio and/or video tracks (eg one for each call participant). The client needs to be able to handle content-add and content-remove jingle messages as new tracks are added and removed.
The client should attempt to send any media it wishes to publish into the call as a track in this jingle session. If it gets back an error of a certain type, it should instead initiate a second jingle session with the MUC bare JID for publication purposes.
The MUC can enforce media publication based on voice in the room, role, or other settings.
If the SFU does speaker detection the MUC can indicate this by modifying the presence of that speaker.
All other advanced SFU features can be mapped to ad hoc commands (recording, streaming, etc).
The track/content `name` shall be set as a string like "audio/occupant-id" where the string before the / may not contain a / but may otherwise be anything. The string after the / should be the occupant id if the MUC uses occupant ids, or otherwise should be the MUC nick of the user in that track.