I found the same thing when doing video transcoding. The VPSs were all woefully underpowered. Netcup bare metal (root servers) ended up getting pretty close and were by far the best bang for the buck of anything I found.
Curious what the setup of VPS' was and why you would expect better than real hardware, video transcoding is quite a beast from what I remember and I just can't imagine there's a VPS solution that expects to keep up
The Intel Xeon processors that cloud providers typically use don't have the Intel Quick Sync core that provides hardware A/V encoding/decoding on typical desktop/laptop CPU SKUs. So the software has to fall back to CPU-based codecs, which are much slower.
AWS EC2 has a VT1 instance family that enables high-speed A/V encoding via a Xilinx media accelerator card.