Optimization and embedded devices. In theory, knowing the full graph upfront all...

Optimization and embedded devices. In theory, knowing the full graph upfront allows the framework to optimize and fuse some operations.

For embedded devices you may not have access to Python. You can precompile the graph to the target device in such cases.

Note that in practice, PyTorch is as fast or faster than Tensorflow and, and newer version allow you to "post compute" the graph and export to ONNX to allow embedded inference using Caffe2