It's because of backwards compatibility. The order of struct fields in memory is defined to be their declaration order by the C standard, and a lot of network protocol code will stop working if that assumption fails. So compilers are not free to reorder fields in memory.
The packing algorithm for Cap'n Proto [1] is cache-aware, within the bounds of also accommodating network optimizations, backwards-compatibility, etc. So yes, newer systems do perform this optimization.