Tensor Compilers: Comparing PlaidML, Tensor Comprehensions, and TVM

byefruit · on May 20, 2018

These guys need to be clearer they're the developers of PlaidML - I don't think it's made very obvious.

Worth pointing out for anyone else that it seems PlaidML is AGPL licensed - so maybe not worth getting too excited about if you have any commercial applications in mind.

hedgehog · on May 20, 2018

Thanks, I updated the wording to call out that we developed it. We dual license the software so there's a commercial option similar to MySQL, just get in touch. The open source project is mostly a way to support research/education.

denfromufa · on May 20, 2018

AGPL would be a restriction if you need to deploy this model on top of PlaidML in production. It is still very useful during the training time after which the neural network can be offloaded into production framework such as tensorflow.

forresti · on May 22, 2018

Heh, I think the part of the point of PlaidML is to avoid the speed/efficiency limitations of TensorFlow in deployment/production.

masahi · on May 22, 2018

The TVM results on resnet50 and mobilenet seem a bit off. On GTX 1070 Ti, with an input of size (1, 3, 224, 224)

TVM result

Resnet50 : 100 inference/sec (0.009983 sec per each run)

Mobilenet: 450 inference/sec (0.002220 sec per each run)

PlaidML result

Resnet50 : 107 inference/sec (0.009302 sec per each run)

Mobilenet: 473 inference/sec (0.002112 sec per each run)

My benchmark script for tvm is here https://gist.github.com/masahi/a386c2ce5b5f8c2d9f7af5e09a8d8...

b33pr · on May 22, 2018

Thank you so much for pointing this out. We'll get updated numbers out soon. How did you benchmark plaid, out of curiosity? The error which I correct here (https://github.com/brianretford/nnvm-rocm/blob/master/mxnet_...) was caused by a desire to roughly approximate how keras does things, and plaidbench w/ keras is the easiest way for us to evaluate things, though it definitely adds in a lot of overhead. My script roughly matches the numbers I get out of your script, though I will say that I think the TVM time_evaluator should be calling Sync on the inside of its loop, to be fair (which I patched it to do to compare against your methodology). It doesn't make a huge difference, but it does exist.

If I just pull the overall kernel runtime from our logs, I get ~525 inferences/sec.

masahi · on May 22, 2018

for plaid, I used

plaidbench keras mobilenet

plaidbench keras resnet50

time_evaluator is what tvm/nnvm folks use for benchmark. See their benchmark script here https://github.com/dmlc/nnvm/blob/master/examples/benchmark/...

hedgehog · on May 22, 2018

Thanks, I've shared this with our team to have a look. There was some subtlety to timing the right thing and it's possible we missed something.