One of the main system challenges has been efficient high-bandwidth and low-latency delivery of packet flows between NICs and GPUs. NVIDIA has their cuVNF library (part of the Aerial SDK) that works with GPU Direct NICs and extends the DPDK toolkit to accomplish this: https://developer.nvidia.com/aerial-sdk
We had 40+mph winds at our home all night. Based on previous experience the power would have gone out anyhow.
In the spring we had a tree take down the power line in front of the house and continue arcing. Thank goodness it was a day after rain and calm wind. On a night like this recent night, the mountain would have been ash.
For those that live in rural, fire-prone areas power outages are often fairly normal and the big danger is burning. If someone hears 4th of July fireworks being set off where I live there's a real risk of a vigilante mob. The tradeoff of power cuts seems reasonable.
There has been reporting that preventive power cuts are generally a good idea, but highly unpopular unless they are really targeted. This is what San Diego Gas & Electric learned: put in equipment to enable more targeted cuts and monitoring. I expect this is where PG&E will need to invest next.
There are many good reasons to embrace better integration with open source. I think the benefits like helping users with custom/broad distro needs and increasing the velocity of collaboration with the ecosystem and partners outweighs factors like 'competition.'
The cogs do turn, albeit slower than people prefer.
The first part of your statement is generally accurate. However, the article is discussing tariffs the US has chosen. This is not something China is doing.
There is massive cognitive dissonance in trying to explain a lot of current events because they are so beyond what you expect to make sense.
Yesterday I was trying to explain to a friend about the children separated at the border and how the parents are then being deported alone and have to apply afterwards to try and find their children, and it took about seven attempts. She didn't understand what I was saying at first because she assumed I must have misspoke, as doing that doesn't compute.
The US courts have ruled that children are not to be incarcerated with their parents. It seems reasonable and moral that children are not punished for the crimes their parents commit. If the parents break the law and go to jail, the children go somewhere else, like foster care. When the parents get out of jail, they have to find their children to bring the family back together. This situation is not really hard to understand if you think that the government should enforce the current immigration laws.
The US immigration system has been so screwed up for so long that everyone just keeps kicking the problems down the road. Probably needs a complete deletion and start from scratch re-write, but that almost never works out like one hopes.
Given the description of your workaround, you may be running into an issue of long GPU driver load times for the nvidia driver. If that is the root cause the recommended solution for this is nvidia-persistenced, though it seems you may have rolled your own equivalent.
"With independent, parallel integer and floating point datapaths, the Volta SM is also much more efficient on workloads with a mix of computation and addressing calculations"
And there are a lot of cases where it doesn't work, specifically with elaborate MPI scenarios and over a network/VPN. Specifically, I do not wish to jump through hoops to enable remote profiling over heavily IT-restricted networks.
For simple apps, nvprof is great. For real low-level blood and guts CUDA optimization, the command-line profiler is still indispensable. Killing it is enough reason for me to go code FPGAs in OpenCL instead of GPUs in CUDA.
Hi varelse, can you tell me more about your profiling use case? nvprof should support MPI profiling scenarios, but perhaps yours is different. I'd love to know details so I can help improve the product. Feel free to contact me at first initial last name at nvidia.com (name is Mark Harris).
To use nvprof with MPI, you just need to ensure nvprof is available on the cluster nodes and run it as your mpirun target, e.g. “mpirun ... nvprof ./my_mpi_program"
You can have it dump its output to files that the NVIDIA Visual Profiler (NVVP) is able to load. You can even load the output from multiple MPI ranks into NVVP to visualize them on the same timeline, making it easier to spot issues.
If you're interested in the architecture of a GPU this Berkeley ParLab presentation by Andy Glew from 2009 covers the basics of how the compute cores in modern GPUs handle threading. It's a subtle, but powerful, difference from SIMD or vector machines.
If you want to get into the details of how a GPU interfaces with the system and OS software, which is almost an entirely other animal, you may want to look at the Nouveau project to get oriented.
This is a developing area with a number of research and development efforts ongoing looking at basic packet filtering flows, for example: https://ieeexplore.ieee.org/document/7019193 https://arxiv.org/pdf/1312.4188.pdf
and AI-driven approaches like CyBERT showing promise: https://medium.com/rapids-ai/cybert-28b35a4c81c4
One of the main system challenges has been efficient high-bandwidth and low-latency delivery of packet flows between NICs and GPUs. NVIDIA has their cuVNF library (part of the Aerial SDK) that works with GPU Direct NICs and extends the DPDK toolkit to accomplish this: https://developer.nvidia.com/aerial-sdk
However, as for integrated devices, I'm familiar with at least one that claims to be on the market: http://www.h3c.com/en/About_Us/News___Events/News/201907/121...