We developed this for Roam (https://ro.am/ [1]). Roam is a virtual office environment for real-time collaboration - audio, video, whiteboards, personal offices, team rooms, theaters and group chat with ML-enabled tools layered on top. It's based on Chromium & WebRTC so that we can ship a cross-platform app (Electron) as well as a nearly-parity Web client with a tiny team.
We've had a good experience with this approach, although it is far from plug-and-play. We do have to identity and patch items in Chromium/WebRTC & the server (Pion) interaction to get our video quality up to compete with Zoom/Teams/Meet. We are able to effectively compare video quality across these providers and expect to reach their video quality with this stack.
Disclaimer: I work on Roam's Chat, AI, and API.
[1] We are currently in closed beta so there's not much there at the moment.
Why are you using Pion instead of Coturn? Is there a lot of innovation left on table for TURN relays?
> ML-enabled tools layered on top.
In your opinion, have Encoded Transforms definitively settled the approach to this matter?
If I want to use free PyTorch garbage X on my camera stream Y, I can take a big hit on occupancy and latency, sometimes as high as 8 frames, by reading back the decoded video to CPU and then immediately copying back to the GPU there. Have you done any work to access the decoded frame in libwebrtc while it is still in GPU memory?
How are you guys going to force Mobile Safari to enable HEVC and VP9 by default in WebRTC? I am of course joking, you cannot do that, but if Apple doesn't want to support X in browser, and Zoom's janky MPEG-over-datachannel does, and it's on a tiny screen so video quality barely matters, do you feel like also building all of Zoom's approach is on your roadmap?
TURN is only one piece of WebRTC, and for the product we're building we want to be in control of our destiny on the AV side. Building on a fairly mature open source implementation of WebRTC was an early choice we made. We've already used this to build an international mesh architecture and few-to-many over WebRTC theater experience that can support thousands of viewers with very low latency.
The ML work we're doing is outside of video processing for other features of our platform, although we have incorporated some tools for features like background blur.
On mobile we've built native iOS and Android apps. Our goal in general is not to chase Zoom, but make a very different experience for distributed work. That said the AV has to work.
> Building on a fairly mature open source implementation of WebRTC was an early choice we made.
Hmm, you would know this best, having dealt with libwebrtc, that there's libwebrtc Chrome, libwebrtc Mobile Safari, and stuff that doesn't work. I'd say more, but if you say Pion 3-times, the Pion maintainer comes out in the comments and casts a spell on you and your jitter buffers become 10,000ms long.
> used this to build an international mesh architecture
AFAIK, Twilio, Amazon, Azure and Steam operate the only at-scale private-routing-over-TURN aka network-traversal service service for others to use, and Twilio and Amazon are the same network. Twilio doesn't even bother with Global Accelerators (aka anycast IPs), they route you via DNS responses, I doubt they've updated the code for years. Do you guys have your own private network? Surely it's "Amazon."
I suppose if you did that work, well you probably don't need 90% of the WebRTC featureset anyway, you might as well centralize it, which is what all the big chat vendors end up doing.
Anyway, it is extremely hard to innovate in this space, it's a lot of mashing together open source libraries and doing IT drudgery. Sometimes that IT drudgery is gluing C++ code together; sometimes it's deploying onto 15 AWS regions. I really appreciate the complexity of what you're doing and it all looks very cool.
Good on you for releasing a dependency to it so early. I like the approach. And kudos for bringing webrtc-node into the present (which admittedly on the web, moves pretty fast! haha, the "past" is not that long ago in real time).
Don't know why you got the downvotes--you just closed a 40M series A, right? Congrats!
We've had a good experience with this approach, although it is far from plug-and-play. We do have to identity and patch items in Chromium/WebRTC & the server (Pion) interaction to get our video quality up to compete with Zoom/Teams/Meet. We are able to effectively compare video quality across these providers and expect to reach their video quality with this stack.
Disclaimer: I work on Roam's Chat, AI, and API.
[1] We are currently in closed beta so there's not much there at the moment.