The code is C++ code, It could be ported to an MCU if there is a fairly recent C* compiler (C11+). But the algorithms are quite expensive, there are multiple matrix inversions, the size of the matrices been as large as the data you put in into the tool. typically between 500 and 10000 data points.
I wonder if it could be portable to a microcontroller Are the blas routines expensive or large matrices?