perf does not provide me with the complete callstack, it's a sampling profiler.
In effectively all latency-sensitive contexts, sampling is worthless. 99.999999% of the time the program is waiting for IO, and then for a handful of microseconds there's a flurry of activity. That activity is the only part I care about and perf will effectively always miss it and never record it to completion.
I need to know the exact chain of events that leads to an object cache miss causing an allocation to occur, or exactly the conditions which led to a slow path branch, or which request handler is consistently forcing buffer resizes, etc.
I never need a profiler to tell me "memory allocation is slow" (which is what perf will give me). I know memory allocation is slow, I need to know why we're allocating memory.
perf is of course a sampling profiler, but perf record -g most definitely does provide you with a complete callstack, provided you have all your debug info in place.
In effectively all latency-sensitive contexts, sampling is worthless. 99.999999% of the time the program is waiting for IO, and then for a handful of microseconds there's a flurry of activity. That activity is the only part I care about and perf will effectively always miss it and never record it to completion.
I need to know the exact chain of events that leads to an object cache miss causing an allocation to occur, or exactly the conditions which led to a slow path branch, or which request handler is consistently forcing buffer resizes, etc.
I never need a profiler to tell me "memory allocation is slow" (which is what perf will give me). I know memory allocation is slow, I need to know why we're allocating memory.