The well-developed, well-security-researched, well-deployed application platform you're looking for is the web. You get exactly this sort of setup if you use WebGL: you interact with an API that expects to be called by unprivileged hostile applications, instead of with a library that helps your direct access to the graphics card driver. Every individual application lives in a separate protection domain (an HTTP origin), and communication between them is limited to message passing with the consent of both sites. The language itself avoids all assumptions of direct access to system resources.
Running everything in a web app is, admittedly, a fundamental change in the stack. But it's fortunately one where a lot of people have independently put work into making this happen. I do my most security-sensitive work on a Chromebook (using the SSH and mosh apps from the Chrome app store) for precisely this reason: it's the right security model, and it's available in my local computer store and works.
> You get exactly this sort of setup if you use WebGL
> I do my most security-sensitive work on a Chromebook
I would highly recommend you use a WebGL whitelist then. WebGL might have been designed with security in mind, but the OpenGL drivers which it, nevertheless, is a very thin wrapper around were, I can assure you, not written with security in mind. WebGL allows some surprisingly direct ways of manipulating hardware and there are a million attack vectors lurking in every WebGL implementation/OpenGL driver combination.
That's a good point. What else should I whitelist other than WebGL? (Is there a general hardening guide for an off-the-shelf, un-jailbroken Chromebook?)
You are right, that is the most secure platform at the moment to distribute graphical user interface programs, but I think it should go further.
E.g. I would go so far, that it shouldn't be possible by default for the server to send me a huge HTML/CSS/JS blob that does all kind of weird stuff (e.g. reporting to the host, mouse movement analysis, etc.).
I am probably in a minority with the following opinion, but I think a page shouldn't even have the ability to enforce a layout which in the end draws pixels on your screen. The web is a step forward and HTML is a good idea, but it is not used anymore in its intended form - it works very well for text distribution, but richer applications have to resort to JS.
Now if you disable JS you could in theory render it as you like, but this is far from trivial.
//edit:
Lets consider a bus company offering search to find offers that get you from A to B (i.e. a route planner, trip finder, ...).
This app shouldn't ship you random HTML/JS, but just the information you need to query its database, which is simply some GETing and POSTing of specified requests. When connecting to the app (going to https://trip-search.example.com) the host could disclose it self as an application having type `(From, Date, To, Date) -> Maybe TripList` or something like that (I think one gets the idea).
The web is great, but I think security should and must go further, I do not want run random Turing machines.
I'm not sure I get why enforcing a layout is a problem from the point of view of application distribution - if nothing else, an app should be able to embed a text renderer and draw onto a <canvas> itself. (It's probably a terrible idea, but it should be able to, because a text renderer is just a program that takes in data and outputs some pixels, and that class of programs is useful.)
I do certainly agree that we need a way of distributing hyper-text content efficiently and in a standard way. Unfortunately the web seems to be moving away from that goal, and AMP isn't quite right and has its own problems.
I'm not sure how I feel about permissions by default. I think permission fatigue is definitely a thing, and for most apps I don't actually care about them exfiltrating mouse movements to the host, as long as they can only exfiltrate it to the one host. On the other hand, I'm a little weirded out that if I plug my piano into my Chromebook, JavaScript can receive and send MIDI events without any permission prompt.
EDIT to your edit: I'm totally okay with running random Turing machines, if their execution environment is constrained (which it is). The only resources that an arbitrary Turing-complete programming language can access are any external resources that it's specifically given an interface to, and time/memory/power consumption. The web platform is pretty good (though, yes, not perfect) at locking down the interfaces given to JS. So it's just a matter resource limits, which is fairly easy; I'm not always thrilled with how much CPU and battery life Twitter takes, for instance, but it's always killable. (Again, in theory.)
You can construct something that's capable of using plenty of memory or power out of any sufficiently powerful Turing-incomplete language. See, for instance, CSS. (I bet with the mechanism you're proposing, you can end up chaining server-side APIs in ways that let you DoS the client, because the server is always more powerful.) And given how easy it is to achieve Turing-completeness by mistake, it doesn't seem like a productive constraint.
> I'm not sure I get why enforcing a layout is a problem from the point of view of application distribution - if nothing else, an app should be able to embed a text renderer and draw onto a <canvas> itself.
Yeah, but in my opinion that is already a specific type of application, like e.g. a computer game, PDF viewer, plotting application.
It is totally different from e.g. an application like Wikipedia or a news page, that provides mostly text and images.
In the end there should just be more of the functionality on client side (rules how to render news pages, how to render wikipedia, etc.).
Serious question - what's the difference between that, and running all apps in their own chroot jails?
It seems like the goal of this app is to isolate things from the network, and from each other. A web app or chromebook method isolates from other apps, ok, fine, but not from the web. Seems more like jail in that sense.
That's a good question! The simple answer is that the web is about whitelisting, whereas a chroot jail is about blacklisting, and blacklisting never works. (Whitelisting, to be clear, also has no guarantee of working, but at least it's possible for it to work.)
When you jail a UNIX process, you start from a model that gives you full access to everything, and gradually revoke access until you're convinced it's secure. There are all sorts of things you might overlook. For instance, if it's just a chroot, there's no network isolation; an app can connect to a server listening on localhost, and it looks like it's coming from localhost. It can connect to a server on the local network, and it looks like it's coming from the host (which is bad if you have, e.g., a corporate network that lets you access interesting data without logging in, or a home router with a default admin password, or many similar cases).
And if you introduce a new mechanism, the chroot probably gives you access to it. For instance, if the chrooted app is able to access my X11 session, it has a ton of powers; it can keylog everything I do, for instance. Even if I mark it "untrusted" a la ssh -X, it has complete powers over everything else that's "untrusted". You could imagine an X11 designed differently, but X11 was designed for trusted apps. Another important case is system calls; a chrooted process has access to every system call, including every vulnerability that might be present. (On some OSes you can restrict what system calls the process can run, but it's still pretty coarse-grained.)
The web starts from the ability to render formatted text with links, which is very close to zero. Everything else is—at least in theory—added from there when safe. Images are safe. Playing audio is pretty safe. Recording audio is probably not safe without permission. (A typical desktop API won't have an easy way to allow one but not the other, and certainly won't have a permission prompt.) Rendering graphics is fine. Rendering 3D graphics is potentially fine, hence WebGL. Rendering graphics on top of someone else's tab is a definite no. Moving your window around or removing its borders is also a definite no. Becoming full-screen requires notifying the user of what just happened. (Again, a typical desktop API won't distinguish these cases and won't give you an easy way to exit full-screen.)
In particular, the web does restrict an app's ability to access the web. An app can freely access its own origin, but it cannot freely access other sites. If http://wiki.internal/ has sensitive data that doesn't require login, a site on the public web cannot retrieve data from there, without the consent of that site. (And the web has already implemented a pretty robust and involved way of handling cross-origin resource sharing.)
If you stick all these things into a desktop API, fantastic! But the web platform is already there, with a number of competing implementations that are all pretty good.
Running everything in a web app is, admittedly, a fundamental change in the stack. But it's fortunately one where a lot of people have independently put work into making this happen. I do my most security-sensitive work on a Chromebook (using the SSH and mosh apps from the Chrome app store) for precisely this reason: it's the right security model, and it's available in my local computer store and works.