Just throwing it out there but maybe to avoid bandwidth spikes that might lead to latency depending on the setup, could you inject some kind of easily identifiable "is muted" signal along with white noise in place of silences? or would that sort of pre-mixing be too slow to do in real time on the client side?