Bit late, and the other comments are right, but it's worth noting that pushes typically aren't that expensive. ARM has a 4-byte STP (store paired) instruction that pushes two values at a time. So usually a push only costs two bytes on ARM, but if you're translating instruction-for-instruction it's four bytes.
(I also forgot while writing the post that 2-byte push instructions are common in 64-bit x86 as well. Half the registers can be pushed with a single byte, and the other half require a REX prefix, giving a two-byte push instruction. So even though the quoted statement is true, the difference isn't that bad in general.)
(I also forgot while writing the post that 2-byte push instructions are common in 64-bit x86 as well. Half the registers can be pushed with a single byte, and the other half require a REX prefix, giving a two-byte push instruction. So even though the quoted statement is true, the difference isn't that bad in general.)