Since TFA talks about deep learning so much, I wonder how many of the applications run on these machines actually are deep learning, or can make use of the tensor cores in some other way.
A lot of people are using GPUs for many other things than ML. The big advantage is the number of cores, and people that run on super computers write algorithms that are highly parallelized (otherwise what's the point). GPUs are getting fast enough that the number of cores they share is gaining an edge. Also the memory on them is MUCH faster than that on a CPU, but the cost is that you have less (20Gb compared to 256Gb).
As far as the TPUs, one big advantage for ML is that they are float16/float32 (normal being f32/f64)(in ML you care very little about precision) and are optimized for tensor calculations. For anything that you don't need that resolution and are doing tensor stuff (lots of math/physics does tensor stuff), then these will give you an advantage. (I'm not aware of anyone using these for things other than ML, but I wouldn't be surprised if people did use them) But other things you need more precision and those won't use the TPUs (AFAIK).
Given that the top one is at Oak Ridge National Lab, my guess would be that they're not exploring deep learning. They've got other applications in mind.
ORNL, like everyone else, is studying ML. They are a research lab. But there are a lot of other applications that they are interested in. These GPUs do help with the traditional research that they perform.
One other point not mentioned in other comments: some work was presented at GTC regarding using tensor cores for a low precision solution followed by iterative refinement to a fp64-equivalent solution. IIRC, 2-4x speed up for fp64 dense system solvers.
My guess would be the vast majority. In addition to being an area that has everyone's interest right now, the hardware is getting more and more specialized so it just doesn't benefit general purpose computing. Just as FPU enhancements target a fraction of computing tasks, GPU's target an even smaller fraction, Tensor cores / 16-bit FP etc smaller still.