perf or Intel VTune are the two standard choices AFAIK. Both have a certain learning curve, both are extremely capable in the right hands. (Well, on macOS you're pretty much locked to using Instruments; I don't know if Callgrind works there but would suspect it's an uphill battle.)
Callgrind is a CPU simulator that can output a profile of that simulation. I guess it's semantics whether you want to call that a profiler or not, but my point is that you don't need a simulator+profiler combo when you can just use a profiler on its own.
(There are exceptions where the determinism of Callgrind can be useful, like if you're trying to benchmark a really tiny change and are fine with the bias from the simulation diverging from reality, or if you explicitly care about call count instead of time spent.)
Callgrind is a CPU simulator that can output a profile of that simulation. I guess it's semantics whether you want to call that a profiler or not, but my point is that you don't need a simulator+profiler combo when you can just use a profiler on its own.
(There are exceptions where the determinism of Callgrind can be useful, like if you're trying to benchmark a really tiny change and are fine with the bias from the simulation diverging from reality, or if you explicitly care about call count instead of time spent.)