Inference times for inception-v3 on a Raspberry Pi have been benchmarked recently [1]. They weigh in at around 2s. If you remove the python layer it gets down to 500ms. However, it's not yet clear if the python overhead is the only reason for that.
Nvidia's Tegra X1 should supposedly be capable of <10ms for imagenet grade models [2]. It's fair to assume though, that this must be for trimmed down and/or 16bit models as compared to full inception models.
And finally, Sam who also facilitated building TF on the Pi is about to host a 6 weeks half theory, half practice course on TF and deep learning [3] (me thinks he deserves this plug).
The training/testing data for ImageNet are not all tiny thumbnails. It's just that the models work better/are more cleanly defined with uniform inputs, so images are scaled/cropped down to the 224x224 or 299x299 or whatever dimensions. Inception-ResNet works fine on huge images, it just doesn't use all of the pixel data.
Very cool! It's good to see an example showcasing the importance of keeping a Session alive when using TensorFlow with Python on the RPi. Glad that the tensorflow-on-raspberry-pi repo was useful; let me know if you (or anyone) runs into any hitches or have any suggestions for improvement.
Hey Sam this is Matt, thanks for your comment and your help a few months back! And for anyone else reading this, Sam is great at getting back to filed issues about installing tensorflow on a Pi: https://github.com/samjabrahams/tensorflow-on-raspberry-pi/i...
Yeah, I'm using Sam's TF wheel on RPi3 and it works great.
> it was not feasible to analyze every image captured image from the PiCamera using TensorFlow, due to overheating of the Raspberry Pi when 100% of the CPU was being utilized
Just put a heatsink on the CPU. It's like $1.50 ... $1.95 on Adafruit. I glue a heatsink to every RPi3 unit I build.
> it was taking too long to load the 85 MB model into memory, therefore I needed to load the classifier graph to memory
Yeah, one of the first things you learn with TF on the RPi is to daemonize it, load everything you can initially, and then just process everything in a loop. That initialization is super-slow, but after that it's fast enough. YMMV
Even with the heatsink (which we install on all of the Pis), we were still having overheating issues. We tried a few other things too to mitigate the problem:
1. Reducing sampling rate for the image recognition (but if we reduced this beneath several seconds we could miss the express trains)
2. Using a cooling fan (https://www.amazon.com/gp/product/B013E1OW4G/ref=oh_aui_sear...) - still didn't prevent overheating if the CPU was continuously loaded at 100%.
3. Only sampling images where we detected motion (https://svds.com/streaming-video-analysis-python/)
We decided to use the 3rd option: Leveraging our motion detection algorithm, which while sensitive to false positives, allows us to use Deep Learning image recognition to eliminate those false positives.
Happy to chat more about your experiences daemonize-ing TF applications!
When you say "overheating issues", what do you mean exactly? IME, at 100% CPU usage with the heatsink on, either it does not throttle down the clock anymore at all, or it does it after a much longer time and the clock reduction is much less.
Are you seeing anything happen, other than some slight throttling?
The chip cannot fry itself. It's designed to slow down so as to stay below the dangerous temperature range.
> Happy to chat more about your experiences daemonize-ing TF applications!
Eh, that was just a fancy way of saying I do what you do. Launch the program once, and let it run forever. It performs initialization (which takes a long time), then it drops into a processing loop: wait for input / read / process / do something / repeat. Pretty basic stuff really.
Right, that chip could never run at 100% CPU load for more than a fraction of a minute; after that it starts slowing the clock. Seems to me like it was meant to run with a heatsink on.
Either that or it was meant for outdoors operation in arctic regions.
Hi annnd! We tried a few times to train a scaled down model on the Pi3, but got nowhere. We've found that the best strategy is to train on the beefiest hardware you have, then transfer the model and run it on the Pi3 for streaming applications.
I'm under the impression that when pi4 will be released it will have a powerful gpu to run neural nets. Now that tf is getting optimised for low end devices and models are getting open sourced, it would be possible to run live offline image recognition and speech recognition on pi.
I imagine a proliferation of robots, security cameras and smart open source siri/alexas
A bit off topic, but related: can anyone point me to a recipe for "low-latency" video with the RPi. I don't even really mean "low-latency", I had tried a couple of different setups/tools a few months ago, and the best I could do was a half or full second delay on the video.
Nvidia's Tegra X1 should supposedly be capable of <10ms for imagenet grade models [2]. It's fair to assume though, that this must be for trimmed down and/or 16bit models as compared to full inception models.
And finally, Sam who also facilitated building TF on the Pi is about to host a 6 weeks half theory, half practice course on TF and deep learning [3] (me thinks he deserves this plug).
[1] https://github.com/samjabrahams/tensorflow-on-raspberry-pi/t...
[2] https://youtu.be/_4tzlXPQWb8?t=53m35s
[3] https://www.thisismetis.com/deep-learning-with-tensorflow