It's not the web, it's computing in general that has not evolved in the right direction. A grid is not a grid is not a grid. An image is not an image is not an image. Etc.
So rather than fixing the web, fix computing. It will fix the web too. If not, the web is even more in trouble.
I suppose what I mean is that there are many idioms that we have learned and gotten used to. For example, actions like copy and paste, or click and drag (touch gestures). Formats, like a table, grid, matrix, cells. Text is also universal and with editors we have a cursor, backspace, select, cut, but also, pagination, margins, fonts, boxes, etc., when we render.
Drawings are represented in various ways. We have paths, but then there are a multitude of serialization formats.
None of these are really first class citizens in computing. They are all application level concepts/implementations. Some are hardware dependent (margins, when we choose a printer with settings).
My point is that we would benefit if these "human needs" were met at a lower level. If a grid was a grid, an image was an image, a drawing a drawing, etc.
At the application level, there could be enrichment, but the essence of a thing could be universal. Like a protocol. Once these things are protocols, everything would start functioning more alike and be able to interoperate in a better way. Including the web.
Some of these things (idioms, actions) appear to work across the board, but they are fragile, unpredictable, and often fail just when you need them the most. That's because every application solves these problems for itself. (Please don't say Windows Clipboard).