Except doing so is probably much more complicated than actually dealing with the CSS and HTML. Hell, it would probably take twice as much manpower to make this remote browser thing accessible than it took to make it work in the first place.
I doubt that. Chromium's internal accessibility tree is already serializable; it has to be, so it can be sent from the renderer process to the main process. So Cloudflare's modified Chromium could send that tree down to their JS-based client, which could then construct a DOM with the appropriate HTML tags and ARIA attributes. This DOM wouldn't have any JavaScript or any references to remote resources, so it wouldn't pose the same security risks as the original web page.
There are several problems with that approach. First, there's not enough information in the serialized accessibility tree to reconstruct the DOM.[1]
Second, the serialization format is an internal API, so there are no constraints on backwards compatibility. It can change in any version of Chromium. In fact, the interface is updated all the time.[2] Cloudflare would have to constantly update their JS client to handle those changes. It's not an abstraction that can be relied upon.
Third, the bandwidth and latency requirements for inter-process communication are far higher than what is available for most client-server communication. Even if the API were stable, I doubt it would be feasible to use on typical Internet connections. If you don't believe me, go to chrome://accessibility/ and click "Start recording" on a tab. I did this for an IRCCloud tab and got 4500 events in approximately 2 seconds.
> First, there's not enough information in the serialized accessibility tree to reconstruct the DOM.
There doesn't have to be enough in there to reconstruct the original DOM, just enough to expose all of the information that screen readers and other accessibility tools need. The fact that that information would be exposed through an HTML DOM in this case is irrelevant; we know the Chromium accessibility tree has all the necessary information.
> Second, the serialization format is an internal API, so there are no constraints on backwards compatibility.
OK, you got me there. Maybe the server side has to go all the way and construct the HTML.
> Third, the bandwidth and latency requirements for inter-process communication are far higher than what is available for most client-server communication.
OK, again, maybe the server side has to digest the data some more before sending it. But at least Chromium is already pushing serialized tree updates. I'll withhold a rant on how it could be much worse.
Does this handle (lots of) (sometimes large) page updates, particularly across a semi-slow, semi-reliable network? Think lazy loading, sPA-style diff-based page transitions, or realtime progress bars. What about element positions (i.e. for switch control overlays that visually mark specific elements on the page)? Assuming this just sends keys directly to the remote browser, what about cursor-related events in editing fields? If latencies are over a few ms with those, some screen readers get confused.
Good questions. You have an especially good point about the latency of responses to cursor movement commands; the developers of NVDA and JAWS might have to rethink their approach to that.
But as far as I know, Cloudflare hasn't even tried yet.
Since this DOM would be invisible, hidden behind the canvas, I'd say you'd need just enough CSS to make each element have the same bounding box as the original. Bonus points if you can safely do enough CSS to make the font size and colors match; screen readers do have commands for querying those things.
> And mutations to this dom would need to be tightly synced to image updates to not confuse the hell out of nvda?
Chromium has already taken pains to make sure this works, because its whole accessibility implementation is dependent on pushing tree updates from the renderer process to the main process.
Because the screenreader is set up and runs on the users local PC, unless you expect users to use whatever unknown-to-them screenreader setup CF happens to choose to run remotely.
Thats 1 option, but I think there is more to explore than that.
E.g. to look at canvas implementation it appears CF delivers a Chromium render to canvas.
Maybe what helps here is standardisation.
That render wasn't sufficient during the browser wars of ole, but isn't a point of contention now because of standardisation.