And there are a lot of cases where it doesn't work, specifically with elaborate ...

bsprings · on July 13, 2015

Hi varelse, can you tell me more about your profiling use case? nvprof should support MPI profiling scenarios, but perhaps yours is different. I'd love to know details so I can help improve the product. Feel free to contact me at first initial last name at nvidia.com (name is Mark Harris).

bsprings · on July 21, 2015

FYI, nvprof works quite well with MPI, as described in this blog post by Jiri Kraus: http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-profi...

To use nvprof with MPI, you just need to ensure nvprof is available on the cluster nodes and run it as your mpirun target, e.g. “mpirun ... nvprof ./my_mpi_program"

You can have it dump its output to files that the NVIDIA Visual Profiler (NVVP) is able to load. You can even load the output from multiple MPI ranks into NVVP to visualize them on the same timeline, making it easier to spot issues.