If you want a HD stream for someone who is talking, and an SD stream for each person in grid view and users on slow connections, you either need access to the unencrypted HD so you can downsample them, or all platforms to support Scalable Video Coding [1] so you can downsample by dropping otherwise-opaque encrypted packets.
And the former is easier than the latter.
[1] https://en.wikipedia.org/wiki/Scalable_Video_Coding