I figured it might, but I think that this is a top of mind question for people and would be nice to make clear in the comments of the post too. So often there’s some theoretical improvement on multiplication that isn’t actually practical. Regardless, they don’t seem to have posted results for CUDA, which is arguably more important than CPU multiplication which is what they tried