To listen to 5 people in a room, you either need to manage a mesh network inside the room where all the peers have 5 peer connection to other peers. This is unmanageable and you have to mux the audio video tracks to provide smoother experience. Other way is to use a media server and create a star topology network where each peer is connected to the media server alone and the media server is connected to all the peers. There are many open source media servers available and should be straightforward to implement them.

