sipherhex's comments

sipherhex · on Sept 20, 2020

Disclosure: I work at NVIDIA.

This is a developing area with a number of research and development efforts ongoing looking at basic packet filtering flows, for example: https://ieeexplore.ieee.org/document/7019193 https://arxiv.org/pdf/1312.4188.pdf

and AI-driven approaches like CyBERT showing promise: https://medium.com/rapids-ai/cybert-28b35a4c81c4

One of the main system challenges has been efficient high-bandwidth and low-latency delivery of packet flows between NICs and GPUs. NVIDIA has their cuVNF library (part of the Aerial SDK) that works with GPU Direct NICs and extends the DPDK toolkit to accomplish this: https://developer.nvidia.com/aerial-sdk

However, as for integrated devices, I'm familiar with at least one that claims to be on the market: http://www.h3c.com/en/About_Us/News___Events/News/201907/121...

sipherhex · on Oct 13, 2019

> The kicker is the winds never arrived.

Winds over 60mph on the ridges. https://www.wunderground.com/dashboard/pws/KCALOSGA177/graph...

We had 40+mph winds at our home all night. Based on previous experience the power would have gone out anyhow.

In the spring we had a tree take down the power line in front of the house and continue arcing. Thank goodness it was a day after rain and calm wind. On a night like this recent night, the mountain would have been ash.

For those that live in rural, fire-prone areas power outages are often fairly normal and the big danger is burning. If someone hears 4th of July fireworks being set off where I live there's a real risk of a vigilante mob. The tradeoff of power cuts seems reasonable.

There has been reporting that preventive power cuts are generally a good idea, but highly unpopular unless they are really targeted. This is what San Diego Gas & Electric learned: put in equipment to enable more targeted cuts and monitoring. I expect this is where PG&E will need to invest next.

sipherhex · on Aug 17, 2019

There are many good reasons to embrace better integration with open source. I think the benefits like helping users with custom/broad distro needs and increasing the velocity of collaboration with the ecosystem and partners outweighs factors like 'competition.'

The cogs do turn, albeit slower than people prefer.

https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-O...

sipherhex · on June 19, 2018

The first part of your statement is generally accurate. However, the article is discussing tariffs the US has chosen. This is not something China is doing.

fredley · on June 19, 2018

I missed that completely. Well that's just bizarre...

scroogly · on June 19, 2018

There is massive cognitive dissonance in trying to explain a lot of current events because they are so beyond what you expect to make sense.

Yesterday I was trying to explain to a friend about the children separated at the border and how the parents are then being deported alone and have to apply afterwards to try and find their children, and it took about seven attempts. She didn't understand what I was saying at first because she assumed I must have misspoke, as doing that doesn't compute.

njarboe · on June 19, 2018

The US courts have ruled that children are not to be incarcerated with their parents. It seems reasonable and moral that children are not punished for the crimes their parents commit. If the parents break the law and go to jail, the children go somewhere else, like foster care. When the parents get out of jail, they have to find their children to bring the family back together. This situation is not really hard to understand if you think that the government should enforce the current immigration laws.

The US immigration system has been so screwed up for so long that everyone just keeps kicking the problems down the road. Probably needs a complete deletion and start from scratch re-write, but that almost never works out like one hopes.

mcguire · on June 19, 2018

As an aside, from the Houston Chronicle: https://www.chron.com/news/houston-texas/article/Explainer-I...

sipherhex · on Oct 27, 2017

Be careful with the non-CUDA 9 AMIs.

CUDA 8 programs will run, but terribly slowly as they JIT their GPU code without optimization for Volta. You want the CUDA 9 AMI version (https://aws.amazon.com/marketplace/pp/B076TGJHY1?qid=1509090...), but it currently only has MXNet and TF.

If you need other frameworks there's the NVIDIA AMI (https://aws.amazon.com/marketplace/pp/B076K31M1S?qid=1509090...) and Volta optimized containers for NVCaffe, Caffe2, CNTK, Digits, MXNet, PyTorch, TensorFlow, Theano, Torch, CUDA 9/CuDNN7/NCCL.

sipherhex · on Oct 27, 2017

Chris from NV here. You can also get a full compliment of DL framework containers, as well as CUDA 9/CuDNN 7/NCCL 2 base container, optimized for Volta by NVIDIA via this AMI https://aws.amazon.com/marketplace/pp/B076K31M1S?qid=1509089...

dharma1 · on Oct 31, 2017

Are there any frameworks yet supporting the advertised 120 TFLOPs mixed precision training?

sipherhex · on Aug 18, 2017

Given the description of your workaround, you may be running into an issue of long GPU driver load times for the nvidia driver. If that is the root cause the recommended solution for this is nvidia-persistenced, though it seems you may have rolled your own equivalent.

https://github.com/NVIDIA/nvidia-persistenced

Docs here http://docs.nvidia.com/deploy/driver-persistence/index.html

sipherhex · on May 11, 2017

"With independent, parallel integer and floating point datapaths, the Volta SM is also much more efficient on workloads with a mix of computation and addressing calculations"

https://devblogs.nvidia.com/parallelforall/inside-volta/

Under "New SM" in "Key Features" section

jabl · on May 11, 2017

But if you read the article it seems the integer units are int32, so not capable of 64-bit computations.

sipherhex · on July 10, 2015

nvprof is still a command line profiler, and has even more features.

varelse · on July 10, 2015

And there are a lot of cases where it doesn't work, specifically with elaborate MPI scenarios and over a network/VPN. Specifically, I do not wish to jump through hoops to enable remote profiling over heavily IT-restricted networks.

For simple apps, nvprof is great. For real low-level blood and guts CUDA optimization, the command-line profiler is still indispensable. Killing it is enough reason for me to go code FPGAs in OpenCL instead of GPUs in CUDA.

bsprings · on July 13, 2015

Hi varelse, can you tell me more about your profiling use case? nvprof should support MPI profiling scenarios, but perhaps yours is different. I'd love to know details so I can help improve the product. Feel free to contact me at first initial last name at nvidia.com (name is Mark Harris).

bsprings · on July 21, 2015

FYI, nvprof works quite well with MPI, as described in this blog post by Jiri Kraus: http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-profi...

To use nvprof with MPI, you just need to ensure nvprof is available on the cluster nodes and run it as your mpirun target, e.g. “mpirun ... nvprof ./my_mpi_program"

You can have it dump its output to files that the NVIDIA Visual Profiler (NVVP) is able to load. You can even load the output from multiple MPI ranks into NVVP to visualize them on the same timeline, making it easier to spot issues.

sipherhex · on Dec 16, 2014

Disclaimer: I work in the GPU industry.

If you're interested in the architecture of a GPU this Berkeley ParLab presentation by Andy Glew from 2009 covers the basics of how the compute cores in modern GPUs handle threading. It's a subtle, but powerful, difference from SIMD or vector machines.

http://parlab.eecs.berkeley.edu/sites/all/parlab/files/20090...

If you want to get into the details of how a GPU interfaces with the system and OS software, which is almost an entirely other animal, you may want to look at the Nouveau project to get oriented.

http://nouveau.freedesktop.org/wiki/