There are signs in the data that the downsampling technique that is being used is not gamma-correct. That would somewhat undermine the results (and also the NNs, if they were trained on similarly broken inputs). Can one of the authors clarify that gamma-correct downsampling/blurring/convolution was used?